


Vol. 47 FEBRUARY, 1956 No, 2 


The Journal of Educational 


UNIVERSITY 
OF “A HIGAN 


Psychology UNIVERSITY 


SEP -9 4956 
Reap iil aan SEP -9 1956 
READING ROOM 60 @ getettls 
READING ROOM 
CONTENTS 
Effects of Coéperation and Competition on the Cohesiveness of Small 
RE PETG PSP LEER eee 65 


BEEMAN N. PHILLIPS AND LOUIS D’AMICO 
Some Relationships Between Methods of Instruction, Personality 


Variables, and Problem-Solving Behavior..................+. 71 
IRVING MALTZMAN, EUGENE EISMAN AND LLOYD 0. BROOKS 
Student Achievement as a Measure of Instructor Effectiveness. ...... 79 


JOSEPH E. MORSH, GEORGE G. BURGESS, AND PAUL N. SMITH 
Personality Factors of Over- and Under-Achievers in Engineering.... 89 
ELVA BURGESS 


A Research Checklist in Educational Psychology.................. 100 
PERCIVAL M. SYMONDS 

General Response Pattern to Five-Choice Items................04.. 110 
EDWARD L. CLARK 

Stimulus Familiarization as a Factor in Ideational Learning....... 118 
WALTER WEISS AND BERNARD J. FINE 

NS i er Re MSS ne ai aereacas 2 Saale be 125 


Published monthly except June to September 


$7.00 a year in U. 8S. and Pan America; Canada, $7.20; other countries, $7.40. 
Single issues $1.20 


WARWICK & YORK, INC. 


BALTIMORE 2, MD. 


Entered as Second Class Matter Nov. 15, 1921, at the Post Office at Baltimore, Md. 
under the Act of March 3, 1879. 








THE JOURNAL OF 
Educational Psychology 


Established I9I0 


BOARD OF EDITORS* 
Wa. Ciark Trow, Chairman 
University of Michigan 


STEPHEN M. Corey J. B. Srroup 
Teachers College State University of Iowa 
Columbia University 


*Appointed by the American Psychological Association. 








EDITORIAL CONSULTANTS 


Pau. BLtommers H. H. Remmers Mixes A. Tinxer 
Cuester W. Harris PexcivaL M. Symonps ALEXANDER G. Wesman 
Harovp E. Jonss Paut A. Wirry 


E JournaL oF Epucationat PsycuHo.ocy is devoted primarily to 
the scientific study of problems of learning, teaching, and measure- 
ment of the psychological development of the individual. THE Journa. 
will contain articles on the following subjects: the psychology of school 
subjects; experimental studies of learning; the development of interests, 
attitudes, and personality, particularly as related to school adjustment; 
emotion, motivation, and character; mental development and methods. 
This last will include tests, statistical techniques, and research techniques 
in cross-sectional and developmental studies. 





INFORMATION FOR CONTRIBUTORS 


Manuscripts and communications regarding editorial matters should be 
addressed to the Chairman of the Board of Editors, Wm. Clark Trow, 
School of Education, University of Michigan, Ann Arbor, Mich. 

Tue Journat has set regulations regarding content and style of material 
published, and these should be observed in the preparation of manuscripts 
to be submitted. 

Tables and graphs —In keeping with the practice of most professional 
periodicals of like character, THE JourRNAL now requires authors to bear 
part of the added cost resulting from the use of tables, formulas, and 
graphs. Prospective contributors may obtain detailed statement as to 

(Continued on Inside Back Cover) 








THE JOURNAL OF 


EDUCATIONAL PSYCHOLOGY 





~ Volume 47 February, 1956 Number 2 








EFFECTS OF COOPERATION AND COMPE- 
TITION ON THE COHESIVENESS OF 
SMALL FACE-TO-FACE GROUPS’ 


BEEMAN N. PHILLIPS 


Indiana State Department of Education 


and 


LOUIS A. D’AMICO 


Xavier University, New Orleans 


In most classrooms children are encouraged to work both co- 
éperatively and competitively. They are encouraged to get along 
with others and are even given grades on their report card for 
“codperativeness,”’ “helpfulness,” “concern for others,” etc. But 
they are also encouraged to compete with others. Goals and stand- 
ards are set that only a few can reach, and grades and promotions 
are given on a competitive basis. 

This mixture of codéperation and competition has caused much 
concern among educators in recent years, and an array of conflict- 
ing data has been collected with regard to the effects of codpera- 
tion and competition on the performance of classroom groups. 
French (4), Deutsch (2), Stendler, Damrin, and Haines (10) found 
that classroom groups performed better under coéperative condi- 
tions, and Hurlock (6), Leuba (7), Tseng (11), and Whittemore (12) 
found that classroom groups performed better under competitive 
conditions. 

One of the questions which has been raised concerns the effect 
of codéperation and competition on group cohesiveness. Research 
by Mizuhara and Tamai (8), Grossack (5), and Deutsch (1, 2) sug- 
gests that codperation and competition are related to group co- 
hesiveness, but the nature of this relationship is not entirely clear. 





1 Based in part on a paper presented at AERA meeting at St. Louis in 
1955. 


65 





66 The Journal of Educational Psychology 


One of the implications of their findings is that the cohesiveness of 
a group should increase under coéperative conditions and decrease 
under competitive conditions. The effect of codperation and compe- 
tition on group cohesiveness was the subject of this study, and the 
hypothesis which was tested was as follows: the cohesiveness of a 
small face-to-face group will increase under coéperative conditions 
and decrease under competitive conditions. 


PROCEDURE 


The subjects used in this study were fourth grade children se- 
lected from two schools in a midwest city of 30,000 people. Eight 
groups were set up with five children in each group. There were 
four high cohesive groups and four low cohesive groups. These 
groups were randomly assigned to work under either coéperative 
or competitive conditions. Two high cohesive groups worked under 
coéperative conditions and two worked under competitive condi- 
tions. Similarly, two low cohesive groups worked under coéperative 
conditions and two worked under competitive conditions. 

According to Festinger, Schachter, and Back (3) the cohesive- 
ness of a group may be based on personal attraction, prestige, 
and/or goal mediation. In this study group cohesiveness was based 
only on personal attraction, and high and low cohesiveness were 
produced by manipulating the degree to which members of a group 
were attracted to each other. Attraction was measured by a socio- 
metric questionnaire in which each child was asked to choose three 
children in the room that he would like to sit by. The questionnaire 
was administered by the teacher at the beginning of the experiment 
and essentially the same questionnaire was administered at the 
end of the experiment. 

The groups were formed on the basis of the choices made on the 
initial questionnaire. High cohesive groups were formed by putting 
together individuals who had selected each other on the sociomet- 
ric test, while low cohesive groups were formed by putting together 
individuals who had not selected each other. Thus, it was possible 
for a group to have from 0 to 15 within-group choices. The actual 
number of within-group choices for the eight groups on the initial 
and final sociometric questionnaire is shown in Table 1. 

A change in cohesiveness was defined as a change in the number 
of within-group choices between the initial and the final question- 








Effects of Coéperation and Competition on Small Groups 67 


TABLE 1—NUMBER OF WITHIN-GRoOUP CHOICES ON THE INITIAL AND FINAL 
SocrioMETRIC TEST FOR THE E1GHT GROUPS IN THE EXPERIMENT 








Group* Class Initial Final Level of Confidence 
HC-CO A 11 10 — 
HC-CO B 9 11 01 
HC-CM A 10 7 01 
HC-CM B 14 14 — 
LC-CO B 0 4 .01 
LC-CO A 0 3 01 
LC-CM A 3 3 — 
LC-CM B 0 3 .01 

















* HC = high cohesion; LC = low cohesion; CO = coéperation; CM = 
competition. 


naire. For example, if a group started out with 10 within-group 
choices and had 15 within-group choices at the end of the experi- 
ment this would indicate that the cohesiveness of the group in- 
creased during the experiment. 

Coéperation and competition were defined in terms of how mem- 
bers shared in their group’s rewards. In coéperative groups they 
shared equally and in competitive groups they shared in accordance 
with their relative contributions. 

The experimental task was a modification of the game “Twenty 
Questions.”’ The object of the game was to identify animals by 
asking the experimenter questions that could be answered yes-or- 
no. The animals were selected from third grade readers used in 
the schools. Participation was considered as a regular class assign- 
ment, and every pupil in each room participated even though he 
may not have been part of an experimental group. 

Each group was taken to the experimental room and was given 
the opportunity to identify five animals on each of four successive 
days. Two randomly selected lists of animals were used, and the 
order of presentation of the animals was varied. Twenty pieces of 
candy were distributed among members of each group after identi- 
fication of each animal. In the coéperative groups each member 
received four pieces of candy each time an animal was identified. 
In the competitive groups a scale was used to determine the amount 
that each member received. The scale which was used is shown 














68 The Journal of Educational Psychology 


below: 
Type of question Points 

Question identifying correct animal 5 

Question identifying outstanding characteristics of the 4 
the animal 

Question identifying distinctive characteristics of the 3 
animal 

Question identifying common characteristics of the ani- 2 
mal 

Question identifying an incorrect animal 1 


Each question asked by a member of a competitive group was 
assigned a point value. These points were totaled and averaged for 
each member after an animal was identified correctly. The 20 
pieces of candy were distributed in the following pattern: 6—5—4— 
3-2. The member with the highest number of points per question 
got 6 pieces, the one with the next highest number got 5 pieces, 


and so forth. 


RESULTS 


The results of the study are summarized in Table 1. The number 
of within-group choices on the initial and final sociometric test are 
shown for all groups. The symbols HC and LC indicate high and 
low cohesiveness, and the symbols CO and CM indicate coépera- 
tion and competition. Class A had 29 pupils and class B had 38 
pupils. In determining whether or not the changes in the number 
of within-group choices were statistically significant at the one per 
cent level, only approximate probabilities were computed. 

Two significant findings are revealed in Table 1. First, groups 
which worked under coéperative conditions during the experiment 
increased in cohesiveness. In three of the four groups the increase 
in the number of within-group choices was significant well beyond 
the one per cent level of confidence. This means that individuals 
who worked together under coéperative conditions liked each other 
better at the end of the experiment than they did at the beginning 
of the experiment. 

The second significant finding is that groups which worked under 
competitive conditions did not necessarily decrease in cohesive- 
ness. In two groups there was no change in the number of within- 
group choices, in one group there was a significant increase in the 
number of within-group choices, and in the other group there was 








Effects of Coéperation and Competition on Small Groups 69 


a significant decrease in the number of within-group choices. The 
inconclusive nature of these results suggests that whether or not 
competition decreases a group’s cohesiveness depends on factors 
not specifically controlled in this investigation. 

An idea of the nature of such factors was obtained by a further 
examination of the data for competitive groups which revealed 
that rewards were fairly evenly distributed in the groups in which 
there was either no change or an increase in cohesiveness during 
the experiment. This suggested the hypothesis that the effect of 
competition on group cohesiveness is dependent on the effect that 
competition has on the distribution of the group’s rewards. If 
competition results in more or less uniform distribution of the 
group’s rewards the effect of competition on the group’s cohesive- 
ness may be similar to the effect that codperation would have on 
the group’s cohesiveness. But on the other hand, if one or two 
members receive most of the group’s rewards the effect of competi- 
tion may be to decrease the group’s cohesiveness. 

One of the implications of these findings is that classroom groups 
which are operated on a coéperative basis can be used to improve 
interpersonal relationships. Such a procedure appears to be poten- 
tially the most useful when boys and girls do not know each other 
very well, when it is desirable to break up cliques or transfer 
friendships, or when it is necessary to help a new pupil or a shy 
pupil to establish friendships. 

Another implication of these findings is that competition does 
not necessarily have undesirable effects on interpersonal relation- 
ships. The effect of competition on a classroom group’s inter- 
personal relationships seems to be conditioned by its effect on the 
distribution of the group’s rewards. If the members are well- 
matched and rewards are evenly divided among them, competition 
appears to have fewer undesirable effects on interpersonal relation- 
ships than if members are poorly-matched and rewards are as a re- 
sult not evenly divided among them. In view of this it may be im- 
portant in setting up classroom groups on a competitive basis to 
put individuals together who are more or less equal in ability on 
the task that has been assigned. In this way it is possible to utilize 
the incentive of competition in group work without seriously af- 
fecting member relationships. 

The possible effects of a different type of task, an older group, 
and a longer period of time in the experimental situation, need to 





70 The Journal of Educational Psychology 


be considered in additional studies of the effects of codperation 
and competition on group cohesiveness. Additional research is 
needed not only to determine the effects of these factors, but in 
view of the tremendous increase in the amount of activity that is 
carried out in groups, there is also a need for additional research 
on other aspects of the behavior of individuals in group situations. 


REFERENCES 


1) Morton Deutsch, ‘‘A Theory of Co-Operation and Competition,”? 
Human Relations 2: 129-152, 1949. 

2) Morton Deutsch, ‘‘The Effects of Cooperation and Competition Upon 
Group Process: An Experimental Study,’’ American Psychologist 4: 263-264, 
1949. 

3) L. Festinger, S. Schachter, and K. Back, Social Pressures in Informal 
Groups, Harper Bros., 1950. 

4) John R. P. French, Jr, ‘‘Group Productivity,’’ in Groups, Leadership 
and Men, pp. 44-54, edited by H. Guetzkow, Carnegie Press, Carnegie Insti- 
tute of Technology, Pittsburgh, Pennsylvania, 1951. 

5) Martin Myer Grossack The Effect of Cohesiveness, Social Influence, 
and Communication, unpublished Doctor’s thesis, Boston University, Bos- 
ton, 1952, 131 pp. 

6) E. B. Hurlock, ‘“The Use of Group Rivalry as an Incentive,” Journal 
of Abnormal and Social Psychology 22: 278-290, 1927. 

7) C. Leuba, ‘‘An Experimental Study of Rivalry in Young Children,”’ 
Journal of Comparative Psychology 16: 367-378, 1933. 

8) Taisuke Mizuhara and Syusuke Tamai, ‘‘Experimental Studies of Co- 
operation and Competition,’’ Japanese Journal of Psychology 22: 124-127 
1952, in Psychological Abstracts, vol. 27, no. 7, July, 1953, p. 519. 

9) S. Schachter et al., “‘An Experimental Study of Cohesiveness and 
Productivity,’’ Human Relations 4: 229-238, 1951. 

10) Celia Stendler, Dora Damrin, and Aleyne C. Haines, ‘‘Studies in 
Cooperation and Competition: I. The Effect of Working for Group and In- 
dividual Rewards on the Social Climate of Children’s Groups,”’ Journal of 
Genetic Psychology 79: 173-197, 1951. 

11) Sing-Chu Tseng, An Experimental Study of the Effect of Three Types 
of Distribution of Reward Upon Work Efficiency and Group Dynamics, Ed. D. 
thesis, Teachers College, Columbia University, 1952, 74 pp. 

12) I. C. Whittemore, ‘“The Influence of Competition on Performance; 
An Experimental Study,’’ Journal of Abnormal and Social Psychology 19: 


236-253, 1924. 








SOME RELATIONSHIPS BETWEEN METHODS 
OF INSTRUCTION, PERSONALITY 
VARIABLES, AND PROBLEM - 
SOLVING BEHAVIOR’ 


IRVING MALTZMAN, EUGENE EISMAN? and LLOYD O. BROOKS 


University of California at Los Angeles 


Maier (5) has distinguished between two kinds of problem-solv- 
ing, reproductive and productive. The former presumably involves 
the transfer of previously acquired responses to new situations on 
the basis of equivalent stimuli. Productive thinking involves the 
integration of previously isolated experiences and the production 
of subjective equivalence in new situations. Learning merely pro- 
vides the raw materials which may later be reorganized under the 
stress of the problem situation. This point of view implies that 
different methods of learning would not have any effect upon sub- 
sequent problem-solving provided the same information is acquired 
under the different conditions. 

Evidence contrary to Maier’s conception of productive thinking 
has been obtained by Szekely (6). The situation presented to his 
Ss may be called the two-sphere problem. Given two identically 
appearing spheres having the same weight but one made of a 
heavy metal and the other of a light metal, the problem is to dis- 
tinguish between the two by some readily available method. A 
necessary preliminary step towards the solution of the problem 
which is readily attained by the Ss is that the sphere of heavy 
metal must be hollow in order to weigh the same as the sphere of 
light metal. The two spheres then can be distinguished by rolling 
them down an inclined plane obtained by tilting the table at which 
the S is seated. Because of its greater moment of inertia, the hol- 
low sphere of heavy metal should roll more slowly. Two groups of 
Ss received this problem. They differed only in the method of 
learning physics material from which a solution could be derived. 





1 This paper is based upon part of the Final Report written on Contract 
No. DA-04-495-Ord-381 between the Office of Ordnance Research, Depart- 
ment of the Army and the University of California. 

2 Now at the University of California, Riverside. 


71 








72 The Journal of Educational Psychology 


One group of Ss was introduced to the physics material in the 
“‘modern” problem-oriented manner. They were first shown a tor- 
sion pendulum which consisted of a bar suspended in a horizontal 
plane from a string. There were four hooks attached to the bar 
from which weights may be suspended. Two hooks were symmet- 
rically located near the middle of the bar and two were symmetri- 
cally located near the ends of the bar. Repeated rotation of the 
bar would produce a torsion tension in the string which would 
spin the bar when it was released. The problem for the Ss of the 
‘‘modern” group was to predict whether the bar would spin more 
rapidly when weights are attached near the middle or at the ends 
of the bar. They were then told that an answer to the problem 
could be derived from the physics text which they were to study. 
They were also instructed that a retention test covering the physics 
material would be given at a later time. 

A second group of Ss was presented the study material in the 
“traditional” manner. They were simply instructed to study the 
physics text and that they would receive a retention test at a 
later time. The torsion pendulum was presented near the conclusion 
of the study period to illustrate a point in the text. Several days 
later both groups received the two-sphere problem. Significantly 
more Ss in the ‘‘modern” method group than in the “‘traditional’’ 
method group solved the problem. 

One purpose of this study was to repeat Szekely’s experiment 
with additional control groups in order to obtain further informa- 
tion concerning the variables influencing performance on the two- 
sphere problem. A second purpose of this study was to determine 
the relationships between a test of abstract reasoning, certain per- 
sonality scales, and performance on the two-sphere problem. 


METHOD 


Subjects. A total of eighty volunteer Ss from introductory psy- 
chology classes served in the experiment. There were thirty-five 
men and forty-five women distributed approximately proportion- 
ately among the different conditions. The three experimental 
groups and one control group each contained twenty Ss. 

Procedure. Approximately two to twelve weeks before the Ss 
served in the experiment, the Taylor Manifest Anxiety Scale (7) 
was administered to their entire class during the regular class hour. 

In the first experimental session the Crown Word Connexion 








Problem-Solving Behavior 73 


list (2) was initially presented to all Ss. This is a paper and pencil 
test of neuroticism based on the principle of controlled association. 
It is composed of fifty items each consisting of a stimulus word 
and two possible response words. One is considered “‘normal”’ and 
the other “abnormal.” Instructions are to underline the response 
word that has a stronger connection with the stimulus word. 

Following this test the Abstract Reasoning Test Form A of the 
Differential Aptitudes test battery (1) was administered. This is a 
paper and pencil test in which each of the fifty items consists of a 
series of four figures, each one of which bears a logical progressive 
relationship to the preceding one. These are followed by five alter- 
native choice figures. The problem is to choose the figure which 
continues the progression. Following the abstract reasoning test, 
one of several different problems was presented to the Ss. They 
were the x-ray and traffic problems used by Duncker (3) and the 
water-jar problem of Luchins (4). Szekely (6) also administered a 
series of problems to his Ss in order to conceal the purpose of the 
experiment. 

For the experimental groups the remainder of this session was 
devoted to the study of the physics text. The control group re- 
ceived the two-sphere problem at this time. Concepts discussed in 
the four and a quarter-page text given to the three experimental 
groups were mass, weight, inertia, acceleration, force, rotational 
motion, and moment of inertia.* Szekely (6) states that these are 
the topics covered in the text of approximately the same length 
prescnied to his Ss. Ss were told that they would be tested on the 
material at the next session and that they could study the text 
until they felt satisfied that they understood its contents. 

The second session for the experimental groups began one or 
two days later with the presentation of one of Duncker’s problems. 
This was followed by the presentation of the two-sphere problem. 
Following the presentation of the two-sphere problem the experi- 
mental Ss were given a short examination of six questions cover- 
ing the material in the physics text. The various tests and the test 
order were the same for the three experimental groups. The only 
difference among the groups was the manner in which the physics 
text was presented. All of the Ss served individually in the experi- 
ment. 





* We are indebted to Mr. Joel M. Kibbee, teaching assistant in the De- 
partment of Physics, for aid in the preparation of the text. 





74 The Journal of Educational Psychology 


Group I: (Szekely’s traditional method). The Ss in the first ex- 
perimental group were asked to read the physics text and were told 
that they would be tested on the material during the second session. 
After reading the material they were given an illustration of the 
moment of inertia using the torsion pendulum. The F demon- 
strated that the pendulum would spin more rapidly when two 
weights are suspended near the center of rotation than when they 
are suspended near the periphery. 

Group II: (Szekely’s modern method). The Ss of the second ex- 
perimental group were first presented with the torsion pendulum 
as a problem and asked to speculate as to whether it would spin 
more rapidly with the weights suspended near the center or the 
periphery. They were then given the text and told that a precise 
explanation of the principle would be found there. 

Group III: The third group received a combination of the two 
previous procedures. They were first given the torsion pendulum 
problem as in Group II, then the text, and then the torsion pen- 
dulum again as an illustration of the moment of inertia. This vari- 
ation in procedure was employed in order to more adequately 
determine the effects of the torsion pendulum illustration on sub- 
sequent problem-solving behavior. 

Although the solution to the two-sphere problem was dependent 
upon the material previously studied, this fact was not brought to 
the attention of the experimental Ss. It was given as just another 
problem in the series of problems presented. The S was first asked 
to visualize two spheres of equal weight and equal size, one of a 
heavy metal and the other of a light metal. He was asked how this 
condition could exist. At the same time the F drew circles of equal 
diameter to represent the two spheres. After the solution was 
verbalized, the EF differentially shaded the two circles to indicate 
that one was hollow and that one was solid. The S was then told 
that he would be given thirty minutes to determine a method for 
differentiating the two spheres without using any special apparatus 
or damaging the spheres in any way. He was told that there was 
a way of doing this, for example, right at the table at which he 
was seated. During the thirty-minute period the F answered all 
questions and indicated improper or unacceptable solutions. A 
record of the S’s responses and questions was kept as well as the 
time at which they occurred. 








Problem-Solving Behavior 75 


RESULTS AND DISCUSSION 


Simple analyses of variance of the various scale scores obtained 
by the different problem-solving groups gave an F of 2.14 for anx- 
iety, 0.53 for neuroticism and 1.52 for abstract reasoning. None of 
these F values are significant at the five per cent level of confi- 
dence, indicating that the variables presumably measured by these 
tests were randomly distributed among the different problem solv- 
ing groups. The retention test based on the physics material gave 
mean scores of 2.9, 2.8, and 2.8 for groups I, II, and III respectively, 
indicating no difference in retention among the three experimental 
groups. 

The performance criterion employed in the analysis of the re- 
sults of the two-sphere problem was simply success or failure. 
These categories were distributed among four classes of responses. 
S may never suggest using an inclined plane, and would be credited 
with a failure at the end of the thirty-minute test period. Such Ss 
spend the entire period giving suggestions such as tapping the 
spheres, bouncing them on the floor, heating or scratching them, 
and so on. If S suggested rolling the spheres down an inclined 
plane, three possible response classes may be elicited. He may 
state that the hollow sphere would roll more slowly, or the solid 
more rapidly. These were the only responses considered correct. 
Additional responses to the inclined plane situation that are in- 
correct are that the spheres would roll at the same rate, and that 
the hollow sphere would roll more rapidly or the solid sphere more 
slowly. 

Twenty per cent of the control group solved the problem cor- 
rectly while the percentage of Ss solving the two-sphere problem 
in each of the three experimental groups was fifty, fifty, and forty, 
respectively. The chi-square obtained from the analysis of these 
three groups was not significant. 

In order to determine whether studying the physics text facili- 
tated solution of the two-sphere problem, the frequency of successes 
and failures in the combined experimental groups was compared 
with that of the control group. A significant chi-square was ob- 
tained, 5.62, P = 0.02. 

The results obtained from the experimental groups are obviously 
contrary to those of Szekely. Twenty per cent of the Ss in his 
group I (traditional method) solved the problem while sixty-five 





Oe _ 


76 The Journal of Educational Psychology 


per cent of the Ss in his Group II (modern method) solved the two- 
sphere problem. The failure to obtain a difference between these 
two Groups in the present experiment as compared with Szekely’s 
seems due largely to the superior performance of our group I. The 
relatively poor performance of group I in his experiment is appar- 
ent in that its frequency of success is identical with that of the 
control group in the present study. Since this group did not receive 
the study material, their performance presumably reflects the 
frequency of chance solutions. Regardless of the various hypotheses 
that may be offered to account for the discrepancy in the results 
of the two experiments, we may conclude that Szekely’s findings 
are of limited generality. Further speculation would be pointless 
until additional research can be conducted designed to further 
investigate the effects of different methods of learning on subse- 
quent problem solving. 

Forty-one per cent of the Ss had one or more courses in high 
school or college physics. They were randomly distributed among 
the four conditions. Forty-six per cent of the experimental Ss 
with course work in physics solved the problem. Forty-five per 
cent of the experimental Ss without courses in physics solved the 
problem. A chi-square of 0.04 was obtained for this difference and 
is not significant. However the experimental Ss with prior courses 
in physics had a mean retention score of 3.38, S8.D. = 1.52, while 
the Ss without prior work in physics had a retention score of 2.41, 
S.D. = 1.37. The ¢ obtained for this difference was 3.09, df = 58, 
P = 0.01. These results show that prior training in physics pro- 
duced a significant increase in retention but did not increase the 
number of solutions to the two-sphere problem. 

Turning now to the relationships between performance on the 
two-sphere problem and the other tests, it was found that Ss of 
the experimental groups scoring above the median, 37, on the ab- 
stract reasoning test produced significantly more solutions than 
Ss falling below the median. Chi-square equaled 5.57, P = 0.02. 
Analyzing the relationship in another manner gives essentially the 
same results. The mean abstract reasoning score of the twenty- 
eight solvers was 37.18,S.D. = 6.64; that of the non-solvers, 32.56, 
S.D. = 9.58. The ¢ for this difference is 2.16, df = 58, P = 0.05. 
No relationship between abstract reasoning scores and retention 
of the physics text was apparent. 

A product moment correlation and a correlation ratio were com- 








Problem-Solving Behavior 77 


puted between the scores on the neuroticism scale and the abstract 
reasoning test following their conversion into McCall T-scores. A 
product moment correlation of 0.01 was obtained which obviously 
does not differ significantly from zero. But the correlation ratio 
was 0.67 which is significant, P = < 0.01. High scores on the ab- 
stract reasoning test tended to be associated with medium neuroti- 
cism scores. Low abstract reasoning scores tended to be associated 
with high and low neuroticism scores. However the chi square 
analysis of the relative frequency of success on the two-sphere 
problem and low, medium, and high levels of neuroticism was not 
significant. Chi-square equaled 0.24. 

Analysis of the effects of anxiety indicated that it was not sig- 
nificantly related to any of the other measures studied. The prod- 
uct moment correlation and the correlation ratio between anxiety 
and abstract reasoning scores following their conversion to T-scores 
were 0.07 and 0.38, respectively. Neither of these differed signifi- 
cantly from zero. There was a tendency for medium anxiety to be 
associated with high abstract reasoning scores. The percentages of 
success on the two-sphere problem at low, medium, and high levels 
of anxiety were twenty-six, fifty, and fifty-four, respectively. The 
low anxiety category ranged from one to ten, medium from eleven 
to twenty, and high anxiety ranged from a score of twenty-one 
and up. The chi square obtained here equaled 3.26 which is not 
statistically significant. But again there is a tendency for low anx- 
iety to be associated with failure in problem solving. 


SUMMARY AND CONCLUSIONS 


Szekely’s (6) experimental evidence on problem solving on the 
two-sphere problem following different methods of learning is con- 
trary to Maier’s (5) theory of productive thinking. However, these 
results were not reproducible in the present study. None of the 
three different methods of presenting the physics material in- 
fluenced subsequent problem solving or retention. 

It was found, however, that success on the two-sphere problem 
was related to superior retention. Previous course work in high 
school or college physics was related to significantly greater reten- 
tion of the physics material, but it did not influence success on the 
two-sphere problem. Ss successfully solving the two sphere problem 
had significantly higher abstract reasoning scores than the Ss that 
failed. 





78 The Journal of Educational Psychology 


Crown’s (2) neuroticism scale was related in a curvilinear man- 
ner to performance on the abstract reasoning scale but was unre- 
lated to performance on the two-sphere problem and the anxiety 
scale. This absence of any relationship between the neuroticism 
and anxiety scales is somewhat surprising in view of the common 
clinical assumption that anxiety is a basic component of neurotic 
behavior (8). Performance on the manifest anxiety scale was not 
significantly related to performance on either the two-sphere prob- 
lem or the abstract reasoning test. 


REFERENCES 


1) G. K. Bennett, H. G. Seashore and A. G. Wesman, Differential A ptt- 
tude Tests. New York, Psychological Corp., 1947. 

2) S. Crown, ‘“The Word Connexion List as a Diagnostic Test,’’ British 
Journal of Psychology, 43: 103-112, 1952. 

8) K. Duncker, ‘‘On Problem Solving,’’ Psychological Monographs, 58: 
No. 5 (Whole No. 270), 1945. 

4) A. Luchins, ‘‘Mechanization in Problem Solving,’’ Psychological 
Monographs, 54: No. 6 (Whole No. 248), 1942. 

5) N.R. F. Maier, ‘“‘The Behavior Mechanisms Concerned with Problem 
Solving,’ Psychological Review, 47 : 43-58, 1940. 

6) L. Szekely, ‘‘Productive Processes in Learning and Thinking,’’ Acta 
Psychologica, 7 : 388-407, 1950. 

7) J. A. Taylor, ‘‘A Personality Scale of Manifest Anxiety,’’ Journal of 
Abnormal and Social Psychology, 48 : 285-290, 1953. 

8) R. W. White, The Abnormal Personality. New York, Ronald, 1948. 








STUDENT ACHIEVEMENT AS A MEASURE OF 
INSTRUCTOR EFFECTIVENESS’ 


JOSEPH E. MORSH, GEORGE G. BURGESS and PAUL N. SMITH 


Personnel Research Laboratory, Air Force Personnel and 
Training Research Center 


The identification of the successful instructor presents problems 
which have long been recognized. Over the past fifty years several 
hundred instructor evaluation studies have been made. The crite- 
rion of instructor effectiveness most widely used in these investi- 
gations has been some form of rating. Usually supervisor ratings 
have been used; but peer ratings, student ratings and self-ratings 
have also been tried out as criteria. 

While the studies appear to show that instructors can be reliably 
rated, ratings are not completely satisfactory as criteria of instruc- 
tor effectiveness. Ratings are subjective, can be biased, and may 
not be based on the more important aspects of instructor perform- 
ance. Dissatisfaction with supervisor ratings, consequently, has led 
to the search for other criteria that would in some measure at 
least, avoid the deficiencies inherent in ratings. It is generally 
agreed that student growth, or gains (the amount of subject mat- 
ter the student learns with respect to his ability) is one of the most, 
if not the most, important criterion of instructor effectiveness. 
There is also general agreement that it is one of the most difficult 
criteria to measure because of the many controls required to make 
comparisons meaningful. 

Any determination of gain is dependent upon the availability of 
valid and reliable instruments for measuring student achievement. 
Proper precautions must be taken to eliminate the possibility of 
contamination of test results or of the rating measure used. The 
supervisor who rates the teacher should not also evaluate the 
students’ performance. Tests should have ample ceiling so that all 
students can demonstrate gain. 

Most investigators of student gain as a measure of instructor 





1 This investigation was carried out under the Air Force Personnel and 
Training Research Center in support of Project No. 7950. Permission is 
granted for reproduction, translation, publication, and use or disposal in 
whole or in part by or for the United States Government. 


79 





80 The Journal of Educational Psychology 


effectiveness have been confronted with sampling problems. In 
general, student gain in one subject matter area cannot be com- 
pared with gains of different students in other areas, and often it 
has not been possible to secure adequate samples of teachers all 
imparting the same subject matter to the same kind of students in 
the same kind of school situation. Consequently, in some studies 
the numbers of teachers and of pupils have been so small that any 
findings must be regarded as highly tentative. 

If student gains are to be used as a measure of instructor effec- 
tiveness, it is necessary to hold constant, insofar as possible, all 
relevant variables other than the effects of the teaching itself. The 
instructor is only one of many factors operating to produce changes 
in the students. Certain aspects of student change such as influence 
of previous instructors and personal and environmental factors 
may outweigh the effects of the instructor’s performance. Then 
too, the instructor is called upon to accomplish many changes in 
his students which are not measurable in terms of subject-matter 
achievement. Any measure of student gains, therefore, reflects 
only partially the instructor’s total effectiveness. 

If it can be determined that students of one instructor make 
greater gains than do those of another instructor, we can attack 
the problem of determining what behaviors, traits or character- 
istics of the successful instructor are responsible for the changes 
produced in the students. 


DESIGN OF THE RESEARCH 


In planning this study a course was required in which an ade- 
quate sample of instructors could be compared on the basis of 
their students’ achievement. From the available courses in the 
Air Force technical schools, the Aircraft Mechanic course at 
Sheppard Air Force Base was selected, and from this course the 
Hydraulics Phase, taught by one hundred and twenty-one instruc- 
tors, was chosen as best fitting the requirements of the study. This 
phase was taught in a single building in which two rows of class- 
rooms were arranged on either side of a central corridor. On each 
side, the first classroom had training equipment for the first day’s 
training, the second classroom had equipment for the second day’s 
training and so on for the seven days of training in the phase. 
Every week day six new classes of about fourteen students each 








Student Achievement and Instructor Effectiveness 81 


entered the first day of training, two classes on each of three shifts. 
Each class was assigned an instructor who moved along with his 
class through the seven days of training. The eighth day was de- 
voted to written and performance testing over the subject matter 
of the previous seven days. The instructor did not participate in 
the testing. 

Because the Hydraulics Phase lasted only eight days (including 
the testing day), it was feasible to duplicate the experiment, thus 
obtaining data based on two classes for each instructor. The cor- 
relation between the gains of the instructors’ first class and of 
their second class (taught approximately a month later) was used 
in determining the reliability of this measure of instructor effect- 
iveness. 


STUDENT VARIABLES 


Since matching of students to insure classes of approximately 
equal ability presented almost insurmountable difficulties, it was 
decided to make adjustments statistically rather than experiment- 
ally. For this purpose a special sixty-five item multiple choice pre- 
test of student knowledge about hydraulics was developed and 
administered to all students before they entered the Hydraulics 
Phase. Since the period of instruction in hydraulics was only eight 
days, the use of identical or even similar tests might have intro- 
duced considerable memorization and practice effects. The pre- 
test, therefore, was composed of items appropriate for students 
who had had some experience with the subject matter to be taught 
but had not been exposed to the specific course subject matter. It 
contained items pertaining to the background knowledge and the- 
ory required for learning the phase content as contrasted with the 
post-test which consisted of items covering specific subject matter 
information expected of students who had just completed the 
phase. The content of the pre-test was selected to correlate maxi- 
mally with the post-test but was not equivalent to it. After statis- 
tical determination of internal consistency and item validity as 
determined by correlation with the post-test scores of a sample of 
students not used in the experiment proper, four forms were con- 
structed. The four forms contained identical items but with 
varied order of items or alternatives to reduce the possibility of 
compromise of the key. That compromise was successfully elim- 





82 The Journal of Educational Psychology 


inated was shown by the fact that mean scores of subsequent 
classes did not increase beyond chance expectancy throughout the 
course of the experiment. 

The written hydraulics post-test was composed of seventy-five 
multiple choice items which satisfied the conditions of the experi- 
ment with respect to difficulty, adequacy of incorrect alternatives, 
and internal consistency. Four forms were constructed in a man- 
ner similar to that of the pre-test. This hydraulics post-test was 
administered to all students (over three thousand) in the experi- 
mental group as their final written test in the Hydraulics Phase. 
All written pre-tests and post-tests were administered and scored 
by research personnel and were not seen by any hydraulics instruc- 
tor. 

The hydraulics performance examination, a job-sample type 
test, regularly used in the Hydraulics Phase was included in this 
study. It concerned checking units for internal leakage, identifying 
units and fittings, and making operational checks of various hy- 
draulics systems. Each student was required to complete ten of the 
twenty-eight performance items available. The time varied from 
ten to thirty minutes per item, the total practical test time being 
about four hours. Hydraulics instructors trained in the administra- 
tion of the performance test were used as examiners. In no case, 
however, did the same man participate in the study as both in- 
structor and examiner. Each item was graded from one to five 
points. In determining final performance grades, total raw scores 
were converted to 7'-scores. 

The other student variables were the three final phase grades, 
one each for fundamentals, structures, and electrical, the three 
previous phases of instruction in the Aircraft Mechanics course. 


INSTRUCTOR VARIABLES 


Supervisors’ ratings of the hydraulics instructors were obtained 
by means of a form officially designated for use by the Air Training 
Command. It included both a forced choice and a graphic rating. 
Shift supervisors were also asked to rank-order the approximately 
forty instructors on each of the three shifts in terms of their 
“‘general effectiveness.’”? Each instructor was assigned a 7'-score 
based on rank and group size. 

Hydraulics instructors were asked to rank-order their colleagues 
on the same shift in terms of their effectiveness as instructors, 








Student Achievement and Instructor Effectiveness 83 


omitting those whom they did not know well enough to rank. The 
rankings were done anonymously. Mean ranks were obtained and 
converted to 7’-scores. 

Just before taking their final phase examination, students made 
an over-all rating of their instructors. They also rated them as 
outstanding, very good, good, poor or unsatisfactory on: (1) knowl- 
edge of subject, (2) teaching methods, (3) understanding of stu- 
dents, (4) as a personal friend. They then ranked these four quali- 
ties in the order of their instructor’s relative strength in them. The 
instructors were scored separately on each of the four qualities by 
averaging over students of the two classes to produce two mean 
scores on each quality, one for the rating and one for the ranking. 

The Wonderlic Personnel Test was used as a measure of instruc- 
tor general intelligence. 

Instructor subject matter knowledge was measured in terms of a 
one hundred and twenty-five-item multiple choice proficiency 
examination developed for airplane hydraulic specialists. 

Verbal facility ratings were obtained by having six supervisors 
rank-order, in groups of six, individual instructors on their organ- 
ization and presentation of special material which they had been 
allowed fifteen to twenty minutes to prepare. Verbal facility scores 
were determined by averaging the ranks assigned by these super- 
visors and converting to T’-scores. 

These instructor variables were evaluated in terms of three 
instructor effectiveness scores. These latter scores were based upon 
the average student post-test scores adjusted for differences in the 
average pre-test and ability scores. The adjustment was based on a 
multiple regression equation computed from within instructor and 
between classes variances and covariances. Two effectiveness scores 
were computed for each instructor, one based on the written post- 
test and one based on the performance post-test. Since these scores 
had variances of similar magnitude and lacked a basis for arbitrary 
weighting, a combined effectiveness score was obtained by weight- 
ing them equally. 


RESULTS 


The reliabilities of instructor effectiveness scores were estimated 
from the correlation between scores based upon the achievement of 
the first class and those based upon the achievement of the second 
class. 





a nc le rn — 


84 The Journal of Educational Psychology 


In terms of the one hundred and six classes in each group, first 
class written gains correlated with the second class written gains 
r = 0.34; first class performance gains correlated with second class 
performance gains r = 0.32 and when the two measures were com- 
bined the coefficient was 0.38. When adjusted for double length by 
means of the Spearman-Brown formula, these coefficients became 
0.51, 0.48, and 0.55 respectively. These latter coefficients represent 
the estimate of reliability when the two classes for each instructor 
are used to obtain an effectiveness score. 

The correlation coefficients reported below are based on the one 
hundred and six instructors for whom complete data were available. 
With this sample size a coefficient of 0.25 is significantly different 
from zero at the 0.01 level of confidence while 0.19 is significant at 
the 0.05 level. The written and performance post-test gains were 
correlated 0.55. 

Student Ratings. Students’ overall rating of instructors was cor- 
related significantly with all three of the gains criteria (written 

= (0.32, performance r = 0.39, and combined gains r = 0.40). 
Slightly higher correlations were found for students’ ratings of in- 
structors’ teaching ability, the r’s being 0.41, 0.41, and 0.46, re- 
spectively, for the rating and 0.41, 0.42, and 0.47 for trait ranking 
with the three gains criteria. 

Students’ ratings of instructors’ understanding of students were 
correlated significantly with the three gains measures (r’s = 0.24, 
0.26, and 0.28) but students’ rankings of this trait failed to cor- 
relate significantly with any gains criterion. 

Students’ ratings of instructors’ knowledge of subject were cor- 
related significantly only with performance gains (r = 0.22). Stu- 
dents’ ranking of this trait showed a significant negative correlation 
with written gains (r = —0.20) and with the combined gains cri- 
terion (r = —0.22). 

Students’ rating of the instructor as a friend was not significantly 
correlated with any of the three gains criteria, while their ranking 
of the friendship trait was correlated negatively with all three gains 
measures (r’s = —0.40, —0.22, and —0.36, respectively). The 
negative correlations found in the case of the trait-rankings prob- 
ably represent an artifact due to the requirements of the ranking 
technique. The sum of the correlation coefficients of the four 
trait-rankings with an outside variable should approximate zero 








Student Achievement and Instructor Effectiveness 85 


TABLE I.—RELIABILITY OF STUDENT RATINGS 








(N = 106) 
Student Rating r* rt 
1. Over-all (fortunate in having this instructor) 0.26 0.41 
2. Teaching ability (rating) 0.34 0.51 
3. Teaching ability (trait-ranking) 0.32 0.49 
4. Understanding of students (rating) 0.26 0.41 
5. Understanding of students (trait-ranking) 0.42 0.60 
6. Knowledge of subject (rating) 0.33 0.49 
7. Knowledge of subject (trait-ranking) 0.32 0.49 
8. As a friend (rating) 0.18 0.31 
9. As a friend (trait-ranking) 0.21 0.34 











* Correlations between mean ratings by first class and second class. 
Tt r adjusted by Spearman-Brown formula for double length. 


because a high score on one trait is compensated for by a low score 
on another trait. 

Since the magnitude of the relationship between student ratings 
of instructor effectiveness and student gains was encouraging, the 
reliability of student ratings was estimated by computing the cor- 
relation between instructor ratings made by the first class and by 
the second class. These results are shown in Table I. 

The high correlation between student gains and student ratings, 
however, could be accounted for by factors specific to the student or 
to the class situation with little reliability over time. To test this 
possibility the correlations between first class student ratings and 
second class gains and vice versa were determined. First class 
student gains correlated significantly with second class student 
ratings of teaching ability r = 0.34, with student ranking of 
teaching ability r = 0.25, and with over-all student rating r = 0.29. 
Second class student gains correlated with these same first class 
measures r = 0.08, r = 0.25, and r = 0.09, respectively. The true 
values of the correlations possibly lie somewhere between these 
two sets of values. If so, some real validity can be attributed to the 
student ratings. 

All student rating variables tended to be highly intercorrelated, 
with the students’ trait-ranking of the instructor as a friend and 
instructor’s knowledge of subject matter tending to give signifi- 





86 The Journal of Educational Psychology 


cantly negative correlations. As explained above, these negative 
correlations were probably artifacts of the trait-ranking technique. 

Peer Rankings and Supervisor Ratings and Rankings. Neither peer 
rankings nor supervisor ratings nor rankings were correlated sig- 
nificantly with any of the three student gains criteria. The two 
supervisor measures when correlated yielded a coefficient of 0.67. 
Supervisor-forced choice ratings were correlated with peer-ranking 
(r = 0.52) while supervisor-ranking and peer-ranking correlated 
0.77. This high correlation found between supervisor and fellow 
instructor rankings coupled with the fact that neither of these 
measures correlated highly with the student gains criteria suggests 
that both supervisors and peers judged instructors on the basis of 
factors other than teaching effectiveness as measured by the gains 
criteria. As will appear later, one of these factors seems to be the 
instructors’ knowledge of subject matter. We thus see rather close 
agreement between peer and supervisor opinion but fellow instruc- 
tors and supervisors agree only slightly with student opinion. 

Other Measures. Verbal facility ratings correlated significantly 
with the written gains measure (r = 0.20) but with neither of the 
other gains criteria. Neither instructors’ hydraulics subject mat- 
ter knowledge nor intelligence were correlated significantly with 
any of the student gains criteria. We see, therefore, that student 
ratings of their instructors were the only instructor measures which 
seemed to predict the student gains criterion. 

Verbal facility, however, was correlated positively with scores 
on the Wonderlic test (r = 0.23) but with no other instructor meas- 
ure. There was a correlation of 0.44 between intelligence test 
scores and the hydraulics proficiency examination. A correlation 
coefficient of 0.43 was found between scores on the proficiency test 
and students’ ranking of instructors’ knowledge of subject matter, 
and an r of 0.19 between the proficiency test and students’ rating 
of instructors’ knowledge of the subject. There was a negative cor- 
relation (r = —0.22) between the proficiency test and students’ 
rating of the instructor as a friend, the correlation not being 
significant in the case of the friendship trait-ranking. Students’ 
trait-ranking of the instructors’ understanding of students also cor- 
related negatively (r = —0.27) with instructors’ proficiency test 
scores. Here again, these negative correlation coefficients may be 
due to the trait-ranking technique used. 








Student Achievement and Instructor Effectiveness 87 


IMPLICATIONS 


There is a continuing need for measures to be used in the evalua- 
tion of instructor performance. Such standards of success in teach- 
ing are required so that the most effective instructors may be pro- 
moted, upgraded, or given supervisory responsibility while the less 
effective instructor may be given remedial training or reassign- 
ment. 

The present study was aimed at developing a reliable criterion 
of instructor effectiveness. Although the work was done with 
instructors in an aircraft mechanics course at an Air Force in- 
stallation, it was anticipated that the results of the study would 
find application in other teaching situations. 

Because it provides the most objective criterion so far discovered, 
the effectiveness measure used was the actual subject-matter 
achievement of an instructor’s students. Since a precise comparison 
of instructors in terms of student achievement is usually imprac- 
tical, an attempt was made in the study to find other character- 
istics of the instructor, besides ability to impart subject matter, 
which might be related to student gains and could thus be used to 
predict student achievement. 

The chief results of the investigation were the findings that stu- 
dent gains can be reliably measured and that students’ ratings of 
their instructors’ teaching effectiveness, and supervisors’ rating 
of instructors’ verbal facility are correlated significantly with stu- 
dent gains. 

Technical school students appear to know when they are well 
taught. Their ratings of their instructors offer promise as a tech- 
nique of instructor evaluation. Teaching which emphasizes acquisi- 
tion of subject matter may induce a favorable attitude of stu- 
dents toward their instructors. 

The high relationship found between ratings and rankings by 
fellow instructors and supervisors, together with the fact that these 
measures appear unrelated to student gains, suggests that fellow 
instructors and supervisors judge instructor effectiveness on the 
basis of factors other than what students learn. One of these fac- 
tors appears to be the instructor’s knowledge of subject matter. 
The study also suggests that further, more detailed investigation 
might be made of speech factors as related to instructor effective- 


ness, 





- ne Nel eh ees pe a 


88 The Journal of Educational Psychology 


SUMMARY 


(1) This study showed that under some conditions, student gains 
can be reliably measured. 

(2) The students appeared to know when they were well taught. 
Student ratings, therefore, offer promise as a technique for instruc- 
tor evaluation. 

(3) The students’ rating of instructors’ subject-matter knowledge 
was correlated significantly with instructors’ proficiency test 
scores. 

(4) Little relationship was found between student gains and 
instructor intelligence or knowledge of subject matter. 

(5) Little relationship was shown between supervisor or fellow 
instructor estimates of instructor effectiveness and student gains. 

(6) The high correlations found between fellow instructor and 
supervisor rankings plus the fact that neither of these measures 
correlated highly with student gains suggests that fellow instruc- 
tors judge instructor effectiveness on the basis of factors other than 
student achievement. One of these factors appears to be subject- 
matter knowledge. 

(7) The correlation between gains on the written and perfor- 
mance tests indicates that they have about twenty-five per cent 
common variance. These measures show a logical patterning with 
other measures which implies that gains scores are specific to the 
type of post-test used. 

(8) The correlation between verbal facility rating of the instruc- 
tor and the student written gains criterion while not high was sig- 
nificant (at the 0.05 level) and suggests further investigation of 
speech factors as related to instructor effectiveness. 








PERSONALITY FACTORS OF OVER- AND 
UNDER -ACHIEVERS IN ENGINEERING’ 


ELVA BURGESS 


Alabama Polytechnic Institute 


PROBLEM 


The purpose of this study was the investigation of the hypoth- 
esis that students who over-achieve in a college situation have com- 
mon personality factors which differentiate them from college stu- 
dents who under-achieve. 


THE SAMPLE 


From a population of four hundred and ninety-two male fresh- 
men engineering students for whom predictions of first semester 
academic performance at the Pennsylvania State College had been 
made in the fall of 1951, two samples of twenty each were selected 
for study. 

The predictive index which had been used was a multiple co- 
efficient of correlation of 0.62, based on Arithmetic and Algebra 
subtests of the Moore-Castore Test of Academic Aptitude (9) and 
high school rank in fifths. 

The samples were drawn in the following manner: From the 
Office of the College Registrar the grade-point average earned by 
each of the four hundred and ninety-two students during the fall 
of 1951 was obtained. The mean and standard deviation were com- 
puted. Mean grade-point average was 1.10, S. D. 0.93, almost ex- 
actly the same as had been previously reported for the engineering 
freshmen in this College in the fall of 1950. 

Applying the range, 1.33 to 0.87, which represents the mean 
grade-point average plus or minus one-quarter of a standard devi- 
ation, to the predictions shown on an alphabetical list of the four 
hundred and ninety-two students, a second list of one hundred and 
twenty-eight students was obtained. These represented the one 
hundred and twenty-eight students whose predicted grades had 





1 This article is based on a dissertation submitted in partial fulfillment 
of the requirements for the Ph.D. degree, The Pennsylvania State College, 
1953. 


89 








90 The Journal of Educational Psychology 


been within one-quarter of a standard deviation from the mean of 
the actual grades earned. 

From the list of one hundred and twenty-eight students, the 
twenty whose earned grade-point average deviated most above 
their predicted grades were selected and became one crite- 
rion group, called the over-achievers. The twenty students whose 
actual grades showed the greatest deviation below their predicted 
grade-point average were selected as the other criterion group, 
called the under-achievers. The mean grade-point average pre- 
dicted for the over-achieving group was 1.10, 8S. D. 0.14; for the 
under-achieving group the mean was 1.09, 8. D. 0.10. 

With the exception of one member of the over-achieving group 
who was twenty-two years of age and a veteran, all students were 
in their eighteenth or nineteenth year at the time of testing. All 
were white and unmarried. Each of the individual engineering cur- 
riculums was represented in each group. 


TEST ADMINISTRATION AND SCORING 


A battery of six tests, as follows, was individually administered 
by the experimenter to all members of both groups, with the ex- 
ception of one member of the over-achieving group, who completed 
only the first two tests: 

(1) The Rorschach Technique (//). 

(2) The Minnesota Multiphasic Personality Inventory, Group 
Form (6). 

(3) The Murray Thematic Apperception Test, Cards 1, 2, 3BM, 
8BM, 9BM, 11, 14, 17BM, 18BM, 19, and 16, in that order (10). 

(4) The Rosenzweig Picture Frustration Study (/2). 

(5) The Strong Vocational Interest Blank for Men (1/4). 

(6) The College Inventory of Academic Adjustment (Bo- 
row) (8). 

In addition, results/of the Bernreuter Personality Inventory (4) 
for each of the subjects were obtained from the files of the Student 
Advisory Service of The Pennsylvania State College, and used in 
the study. This test had been administered by the Student Advisory 
Service staff a few months previously. 

Until all testing, test scoring and checking were completed, the 
subjects were identified to the examiner only by testing number. 
This precaution was deemed desirable in order to minimize and 
equalize as much as possible the effects of the subjectivity which, 











Personality Factors of Over- and Under-Achievers 91 


by the nature of the instruments, enters into the administration 
and scoring of the projective techniques, particularly the Rorschach 
and the Thematic Apperception Test. 

The Rorschach Technique was administered in accordance with 
the method recommended by Beck, his suggested directions having 
been used as standard directions for this testing (2). Scoring also 
followed the Beck system, with the exception that the protocols 
were not scored for his Z factor, and were scored for FM (animal 
movement) and m (inanimate movement), as is done by Klopfer 
(8) and Hertz (7). For scoring quality of form, statistical tables 
published by Beck (2) and by Hertz (7) were consulted, in the 
order named. 

The subjects wrote their T.A.T. stories, with a time limit of five 
minutes for each, except in the case of Card 16 where some exten- 
sion of time was granted. The T.A.T. stories were scored for needs, 
press, and outcome in accordance with the scoring system pub- 
lished by Stein (73). 

All other tests were administered in accordance with the direc- 
tions given in the manual provided by the authors of each test. 
The Strong Vocational Interest Blank for Men was scored on only 
two keys, the Occupational Level Scale and the Interest Maturity 
Scale. All scoring was checked by re-scoring, even the projective 
techniques. 


STATISTICAL ANALYSIS 


Means and standard deviations were computed, and a test of 
the significance of the difference between means of the two groups 
was made by employment of the ¢-ratio for small samples for the 
following number of test variables: 

(1) Rorschach Technique: forty-five location, productivity, and 
determinant variables, plus thirty-five signs. 

(2) The Minnesota Multiphasic Personality Inventory, Group 
Form: fourteen variables. 

(3) Murray Thematic Apperception Test: seventeen variables. 

(4) The Rosenzweig Picture Frustration Study: twenty-five var- 
iables. 

(5) Strong Vocational Interest Blank for Men: two variables. 

(6) Borow College Inventory of Academic Adjustment: seven 
variables. 

(7) Bernreuter Personality Inventory: six variables. 








TaBLE I—Summary: Test Factors ror Wuicu A ‘‘t’’ or 1.0 on More 
Was OBTAINED FOR DIFFERENCE BETWEEN MEANS 






































Over-Achievers Under-Achievers 
Test ‘* 
N Mean S.D. N Mean S.D. 
Rorschach 20 20 
VIII-X per cent 32.10 | 8.12 38.68 | 7.54 |—2.59 
R 
Affective ratio (Beck) 0.49 | 0.18 0.65 | 0.19 |—2.68 
R, total responses 31.15 |16.73 37.70 |18.60 |—1.14 
P, popular responses 5.65 | 2.39 6.60 | 2.20 |—1.28 
A% (animal content) 43.00 |13.60 51.60 |16.45 |—1.76 
F% (form only) 55.80 |16.38 61.25 |15.48 |—1.05 
FC 1.60 | 0.97 2.35 | 1.73 |—1.65 
H+A 13.15 | 5.77 16.95 | 8.78 |—1.58 
Hd + Ad 5.80 | 5.91 8.30 | 7.14 |—-1.18 
S% of R 8.40 | 6.20 5.90 | 5.30 | 1.34 
V% of shading responses 41.69 |32.38 25.56 (25.93 | 1.49 
H + A: Hd + Ad 16 | 2.42 /|1.17 | 18 3.91 | 3.98 |—1.48 
Fluctuation in produc- 1.34 | 1.41 1.52 | 1.62 |—1.13 
tivity 
Thematic Apperception Test} 19 20 
Ending happy 6.21 | 2.98 5.25 | 2.81 | 1.01 
Total needs 19.00 | 5.43 16.45 | 4.36 | 1.57 
Total needs and press 23.93 | 6.94 20.90 | 6.32 | 1.40 
Needs: hero initiates ac- 10.68 | 4.05 8.15 | 3.62 | 2.00 
tivities re: situations or 
objects 
N achievement 2.16 | 1.14 0.90 | 1.09 | 3.44 
Aggressive needs 1.47 | 1.50 0.55 | 0.59 | 2.43 
Improvement needs 5.84 | 2.76 3.25 | 2.86 | 2.80 
‘‘Freedom”’ needs 1.26 | 1.33 1.95 | 1.66 |—1.40 
Dependent needs 0.32 | 0.57 1.30 | 1.31 |—2.97 
Rosenzweig Picture Frus- | 19 20 
tration Study 
I’ 0.97 | 0.64 1.22 | 0.76 |—1.09 
M (in E — D column) 3.68 | 1.62 2.78 | 1.29 | 1.86 
M (total) 6.63 | 1.75 5.92 | 1.67 | 1.26 
M% 27.58 | 7.07 24.95 | 7.15 | 1.12 
O — D% 17.42 | 6.30 19.95 | 4.64 |—1.38 
E — D% 54.11 | 9.74 50.30 | 6.59 | 1.38 
E% 6.14 | 3.46 7.22 | 2.53 |—1.08 
I% 7.45 | 4.16 5.43 | 2.74 | 1.72 
O — D column 4.16 | 1.50 4.75 | 1.15 |—1.34 
E — D column 12.97 | 2.33 12.02 | 1.64 1.42 
MMPI 20 20 
H, scale 12.25 | 1.37 11.05 | 2.65 | 1.75 
M; scale 21.70 | 3.49 23.50 | 3.74 |—1.45 





92 











Personality Factors of Over- and Under-Achievers 93 


TaBLE I—Cont. 


























Over-Achievers Under-Achievers 
Test a 
N Mean S.D. N Mean S.D. 
Strong Vocational Interest | 19 20 
Blank for Men 
Occupational level 7.79 |47.82 —16.00 [53.25 | 1.43 
Borow College Inventory of | 19 20 
Academic Adjustment 
Part I 17.79 | 4.38 14.70 | 5.96 | 1.80 
Part II 22.42 | 4.77 20.05 | 6.60 | 1.26 
Part III 25.68 | 5.07 19.30 | 8.61 | 2.76 
Part IV 29.89 | 8.29 26.65 | 9.67 | 1.10 
Part VI 20.95 | 4.05 18.90 | 5.37 | 1.32 
Total score 137.26 (27.35 118.40 |39.02 | 1.71 











* A t of 2.72 is significant at the one per cent level; 2.43, at two per cent 
level; 2.03, at five per cent level; 1.69, at ten per cent level. Negative value 
indicates that the difference was in favor of under-achievers. 


Protocols of the T.A.T. were also studied in an effort to locate 
differentiating content material. 


RESULTS 


All factors for which the ¢, obtained for the significance test of 
the difference between means, was 1.0 or higher are presented in 
Table I. Statistics for the six Rorschach signs are given in Table II. 

For none of the Rorschach variables was the difference significant 
at the one per cent level of confidence. (However, two of the signs 
were significant at this level). Two variables, VIII-X per cent and 
affective ratio (Beck), were in favor of the under-achievers and 
significant at a level beyond the two per cent but not reaching the 
one per cent level of confidence. 

Variables for which the obtained ¢ was not significant but was 
greater than 1.0, with the difference in favor of the under-achievers, 
were: Total R, A per cent, F per cent, P, FC, H + A, Hd + Ad, 
H + A: Hd + Ad, and fluctuation in productivity. 

Variables on which the over-achievers scored higher and the ¢ 
was 1.0 or greater but not to a significant level were: S per cent and 
V per cent (per cent of vista responses to all shading responses). 

The sign-check of the Rorschach protocols revealed six signs 
which, on the basis of a t-test of the difference between percentages, 
differentiated the two groups. Two of these were negative signs. 








94 The Journal of Educational Psychology 


TABLE II—SIGNIFICANCE OF RorscCHACH SIGNS 





Over-Achievers Under-Achievers 
(N. 20) (N. 20) 
Sign ‘* 

N Per SE N Per SE 
cent | Per cent cent | Per cent 








Ap is W, (W), or D 14 | 70 | 10.25; 8 | 40 | 10.95 | 2.00 
A%, 50% or less 16 | 80 8.94; 9/45 | 11.14] 2.45 
Affective ratio, 0.65 or less 17 | 85 8.00 | 10} 50 | 11.18 | 2.68 
No poor form quality (F-) or | 17 | 947 | 5.61 | 5 | 28f | 10.58) 5.51 
Pure C on Card IX 
FC equal to or greater than | 8 | 40 | 10.95 | 14 70 | 10.25 |—2.00 
CF 
Per cent of responses to last 3 | 3 | 15 8.00 | 11 | 55 | 11.14 |—2.92 
cards, 40 to 60% 


























* A t of 2.03 is significant at the five per cent level; 2.43, at two per cent 
level; 2.72, at one per cent level. Negative value indicates difference was in 


favor of under-achievers. 
t N is 18 because of two card rejections. 


The six signs were: 

AP is W, (W) or D (Beck), ¢ significant just short of the 5 per 
cent level of confidence. 

A per cent, fifty per cent or less, significant at the two per cent 


level. 
Affective ratio, 0.65 or less, significant beyond the two per cent 


level. 

No F- or pure C on Card IX, significant beyond the one per cent 
level. 

FC equal to or greater than CF, a negative sign, significant just 
short of the five per cent level. 

Per cent of responses to last 3 cards, forty to sixty per cent, a 
negative sign, significant beyond the one per cent level. 

When a point-score of 1 was assigned to each and a total sign- 
score of 4 used as the cutting score for achievement, eighty per cent 
of the over-achievers were correctly classified and only ten per cent 
of the under-achievers incorrectly included as over-achievers. 

The statistical analysis of the T.A.T. variables revealed that over- 
achievers scored higher on need for achievement (significant beyond 
the one per cent level), needs to be aggressive (significant just at 
the two per cent level), and need to improve self or status (signifi- 
cant beyond the one per cent level). Their needs were more fre- 








Personality Factors of Over- and Under-Achievers 95 


quently related to activities initiated with regard to objects and 
situations (significant just short of the five per cent level of con- 
fidence). With a ¢ of 1.0 or higher taken as indicator of a tendency, 
over-achievers showed a tendency to more total needs and more 
needs and press combined, and to give happy endings to their 
stories. Under-achievers scored significantly higher (beyond the 
one per cent level) in number of needs to be dependent, and ¢-ratios 
higher than 1.0 but not to the point of statistical significance sug- 
gest that they have a tendency to have more need to be free from 
restraints. 

Analysis of Card 2 of the T.A.T. revealed that these male college 
students readily identified the girl in this picture, eighty-four 
per cent of the over-achievers and eighty-five per cent of the under- 
achievers identifying her as a student. Of the under-achievers who 
told stories about the girl in a school situation, fifty-three per cent 
attributed her presence in school or college to parental pressure or 
other sources of motivation, and forty-seven per cent thought she 
was not interested in school or disliked it. Corresponding percentage 
for over-achievers was zero in each case, and eighty-three per cent 
of those telling stories in which the girl was placed in a school situ- 
ation saw her as positively oriented to school. Forty-five per cent 
of the total group of under-achievers saw her as preferring to re- 
main in the home environment, and again none of the over-achiev- 
ers thought that; on the contrary, sixty-three per cent of them 
ascribed to her a feeling of wanting to escape the limitations of life 
on the farm. 

Card 2 seems an excellent instrument for revealing attitudes 
toward the school situation, i.e., like or dislike for school, value of 
a college education recognized or not, and basis of motivation for 
attendance at college. Attitudes toward environmental factors 
were also disclosed. 

No significant differences were uncovered by the Rosenzweig 
Picture Frustration Study. Variables on which over-achievers 
scored higher and the obtained ¢ was 1.0 or greater suggest that this 
group has a tendency to emphasize the assignment of the blame 
in frustrating situations but that they are impunitive, i.e., assign 
blame to circumstances which are unavoidable, and that in super- 
ego situations they admit guilt but claim extenuating circum- 
stances. Variables on which under-achievers scored higher and for 
which the obtained #-ratio was 1.0 or more but not to a point of 








96 The Journal of Educational Psychology 


statistical significance, indicated that this group tends to empha- 
size the frustrating object or problem in a frustrating situation, and 
to be more intropunitive, i.e., to turn the aggression inward, but 
minimizing the frustration while doing so. In super-ego situations 
they tend to be more aggressive in their denial of blame 
when charged with an offense. 

No significant differences were obtained for any of the clinical 
or validating scales of the MMPI or for an Anxiety Index or In- 
ternalization Ratio (15). Obtained ¢-ratios of 1.0 or higher suggest 
a tendency for over-achievers to score higher on the H, scale and 
under-achievers to score higher on M; scale. 

No #-ratios as high as 1.0 were obtained on the six scales of the 
Bernreuter Personality Inventory. 

On the Strong Vocational Interest Blank for Men no significant 
difference was found for either the Occupational Level scale or the 
Interest Maturity scale. For the OL scale the obtained ¢ was 1.43 
in favor of over-achievers. 

For all seven variables of the Borow College Inventory of Aca- 
demic Adjustment the direction of the difference between means 
was in favor of the over-achievers, with ¢-ratios higher than 1.0 
in all but one instance, Part V, Mental Health, and significant 
beyond the one per cent level on Part III, Personal Efficiency in 
Planning and Use of Time. 


CONCLUSIONS 


The hypothesis that over-achievers among a group of college 
students predicted to attain average academic achievement in 
engineering curriculums would show common personality factors 
which would characterize them as a group and distinguish them 
from a group of under-achievers has been substantiated. Specifi- 
cally, it has been found that: 

(1) Over-achievers are less labile in their affective reactions, 
tend more towards constriction, and are more inhibited in emo- 
tional response to pleasurable aspects of the environment. Intel- 
lectual adaptivity is greater, the approach to problems is more 
cautious and concretistic, and intellectual control of emotional 
reaction in the face of strong outer stimulation is more effective. 
Need for achievement and improvement of the self or status is 
greater, and they are more motivated for college study, enjoy it 
more, and expect to get more from it. They are more efficient in 








Personality Factors of Over- and Under-Achievers 97 


the planning and use of their time, and tend in general to be better 
adjusted to the college situation. To a greater degree they chafe 
under environmental limitations as they have known them and 
relate attainment of an education to escape from those restric- 
tions. They show more needs to be aggressive, and less social skill. 

Expectations that they would show evidence of more feelings of 
inferiority, a more optimum amount of anxiety, and higher occu- 
pational level and maturity of occupational interests were not 
confirmed. The findings in these respects were not statistically sig- 
nificant, although they did lie in the anticipated direction. The ex- 
pectation that they would give evidence of more self-sufficiency 
was not confirmed. 

(2) Under-achievers are less intellectually adaptive, over-gen- 
eralize and over-extend the self, and show less intellectual control 
and repression of emotional reactivity. They over-react to en- 
vironmental circumstances, and in general show easy, labile affec- 
tivity. Establishment of rapport in social situations is easier, but 
they are more dependent in their attitudes towards others. Motiva- 
tion for academic achievement is weak; they tend not to enjoy the 
school situation, and to be unable to see the value of an education. 
They tend to see their own environment as a desirable one. 

In addition, results obtained in this study seem to justify the 
following conclusions: 

(3) Although the Rorschach Technique, individually admin- 
istered and subjected to statistical analysis by means of the ¢-test, 
does reveal certain personality differences between academic over- 
and under-achieving groups, the testing of the individual variables 
for significance of difference between means is not adequate to 
reveal the more subtle personality differences nor the inter-rela- 
tionships which are sometimes thought to exist between Rorschach 
variables. 

(4) The Thematic Apperception Test, particularly Card 2, is a 
potentially more useful instrument for a research approach to per- 
sonality study of academic achievement than is the Rorschach. 
For intensive and definitive research, however, all of the T.A.T. cards 
should possibly be administered and subjects should not be asked 
to write their stories, if the most is to be obtained from this tech- 
nique. 

(5) The Minnesota Multiphasic Personality Inventory, Group 
Form, the Bernreuter Personality Inventory, the Occupational 








98 The Journal of Educational Psychology 


Level and Interest Maturity scales of the Strong Vocational Inter- 
est Blank for Men, and the Rosenzweig Picture Frustration Study, 
when given and scored in the conventional manner, are not suitable 
instruments for differentiating groups of academic deviates. 

(6) The Borow College Inventory of Academic Adjustment, 
particularly Part III, offers sufficient promise as an effective instru- 
ment in prediction of college achievement to warrant its wider 
trial-use in individual counseling and group research for purposes 
of further exploring its worth. 


IMPLICATIONS FOR FURTHER RESEARCH 


Results obtained in this study suggest that additional research 
particularly along the following lines might prove fruitful. 

(1) Cross-validation, using larger samples, of the six Rorschach 
signs. It would appear that the sign, no F- or pure C on Card IX, 
showing a high degree of significance in this study, would be es- 
pecially useful in an individual counseling situation if it were found 
to hold up under cross-validation. 

(2) Another study, also using larger samples, aimed at a more 
comprehensive investigation of the Thematic Apperception Test 
as a discriminative technique for academic deviates would be highly 
desirable. Again, if the clear-cut and specific differences revealed 
for Card 2 were confirmed in a cross-validation study, this finding 
would have immediate applicability and utility in individual coun- 
seling, particularly when time available with the client is definitely 
limited by the situation, as it usually is, for example, in Veterans 
Guidance Centers. 

(3) Although the differences found on the Rosenzweig Picture 
Frustration Study were not significant, the suggestion that over- 
achievers tend to be more impunitive might be profitably investi- 
gated in another study. Impunitiveness has been interpreted as 
evidence of repression, an interpretation which is generally in ac- 
cord with the other findings in this study. Research using larger 
samples and trying out a time limit below the median eighteen 
minutes taken by the subjects in this study might prove interest- 
ing. 

(4) Findings for the Occupational Level key of the Strong Voca- 
tion Interest Blank for Men suggest that further research might 
reveal a standard score below 40 on that scale to be indicative of 
under-achievement. This suggestion, if corroborated by additional 








Personality Factors of Over- and Under-Achievers 99 


research, would again be highly useful in individual counseling, or 
perhaps, even in group prediction. 


REFERENCES 


1) 8. J. Beck, Rorschach’s Test: II. A Variety of Personality Pictures. 
New York, Grune & Stratton, 1947. 

2) S. J. Beck, Rorschach’s Test: I. Basic Processes. New York, Grune & 
Stratton, 1949. 

3) S. J. Beck, Rorschach’s Test. III. Advances in Interpretation. New 
York, Grune & Stratton, 1952. 

4) R. G. Bernreuter, Manual for the Personality Inventory. Stanford, 
California, Stanford University Press, 1935. 

5) H. Borow, Manual for the College Inventory of Academic Adjustment. 
Stanford, California, Stanford University Press, 1946. 

6) S. R. Hathaway and J. C. McKinley, Minnesota Multiphasic Person- 
ality Inventory; Manual, revised. New York, Psychological Corporation, 
1951. 

7) Marguerite R. Hertz, Frequency Tables to be used in Scoring Responses 
to the Rorschach Ink-blot Test, third edition. Cleveland, Ohio, Western Re- 
serve University, 1946. 

8) B. Klopfer and D. Kelley, The Rorschach Technique. Yonkers-on- 
Hudson, New York, World Book Company, 1946. 

9) B. V. Moore, and G. F. Castore, The Pennsylvania State College 
Academic Aptitude Examination, revised. The Pennsylvania State College, 
State College, Pennsylvania, 1947. 

10) H. A. Murray, Manual for the Thematic Apperception Test. Cam- 
bridge, Mass., Harvard University Press, 1943. 

11) H. Rorschach, Psychodiagnostics, a Diagnostic Test Based on Per- 
ception, translated by P. Lemkau and B. Kronenberg, fourth edition. New 
York, Grune & Stratton, Inc., 1949. 

12) S. Rosenzweig, Edith E. Fleming and Helen J. Clarke, Revised Scor- 
ing Manual for the Picture Frustration Study. Provincetown, Mass., Journal 
of Psychology Press, 1947. 

13) M. I. Stein, The Thematic Apperception Test. An Introductory Manual 
for its Clinical Use with Males. Cambridge, Mass., Addison-Wesley Press, 
1948. 

14) E. K. Strong, Jr., Manual for Vocational Interest Blank for Men. 
Stanford, California, Stanford University Press, January, 1951. 

15) G. 8. Welsh, “An Anxiety Index and an Internalization Ratio for 
the MMPI,”’ Journal of Consulting Psychology, 16 : 65-72, 1952. 











A RESEARCH CHECKLIST IN EDUCATIONAL 
PSYCHOLOGY 


PERCIVAL M. SYMONDS 


Teachers College, Columbia University 


For eleven years, from 1944 until the present time, the writer 
has been chairman of the departmental seminar and project con- 
ference in Psychological Foundations in Teachers College, Colum- 
bia University. This seminar, representing the most advanced work 
in the Department of Psychological Foundations, is a requirement 
for all doctoral candidates; and an acceptable report is one of the 
requirements for certification for the Doctor of Philosophy degree. 
It provides an opportunity for students to report on their doctoral 
problems or projects, and to benefit by discussion, suggestions and 
criticism from staff and members of the seminar. Parenthetically, 
it may be noted that at Teachers College the study on which the 
student works as a dissertation for the Ph.D. degree is called a 
problem, whereas the field work or plans with regard to an edu- 
cational enterprise, a report of which the student submits as one of 
the requirements for the Ed.D. degree, is called a project. The 
first report that the student makes in this seminar usually consists 
of his plan for the problem or project. Subsequent reports will 
register progress on the student’s research and will give the seminar 
the opportunity to think along with the student on the various 
problems that arise during the course of his investigation. 

Students are encouraged to take an active part in the discus- 
sions of the seminar, and they are judged in part by their contribu- 
tions to discussion. 

Following the seminar each week the departmental staff meets 
to discuss the student’s presentation, and evaluations are made of 
the report. Typically the staff appraises the student’s plans and 
makes recommendations for his prosecution of his study. 

A report of the discussions in the seminar, prepared by the chair- 
man each week, includes the outline of the study which the student 
has prepared for the seminar, records all questions raised and the 
discussion around these questions and in addition the deliberations 
and conclusions reached by the staff. 

The checklist of questions which constitutes the body of this 


100 











Educational Psychology Research Checklist 101 


paper has been derived from these weekly reports. Questions raised 
during each seminar presentation were taken off on slips and then 
rearranged by topics. This checklist may serve as an inventory of 
the kind of issues that arise in planning a research study in con- 
temporary educational psychology and should be of service to stu- 
dents who are planning research projects. If a student answers for 
himself the questions raised in this checklist, he should be able to 
satisfy the questions which are likely to be put to him by members 
of an educational psychology faculty during a seminar report. The 
checklist ought also to be of some value to those who give instruc- 
tion in research methods as an outline of the various issues which 
might be presented in such a course. 


QUESTIONS TO BE CONSIDERED IN PLANNING A DOCTORAL 
DISSERTATION OR PROJECT IN PSYCHOLOGICAL 
FOUNDATIONS 


This outline is based on questions, comments and suggestions 
made in Education 307SC-308SC (Departmental seminar and pro- 
ject conference in educational psychology, developmental psy- 
chology, remedial reading in elementary school and psychology of 
the elementary school subject, social psychology and educational 
measurement and evaluation, Teachers College, Columbia Uni- 
versity, 1944 to 1955.) 

The following outline should serve as a checklist of the more com- 
mon queries raised in the critical evaluation of proposed research 
studies 


A. Scope and Definition of Study 


1. Is your problem being considered broadly enough? 

2. Have you sufficiently limited your problem? 

3. What are the educational implications of your study? 

4. Have you governed your decisions by the experiences of 
investigators who have preceded you? 

5. How significant psychologically is a socially selected group 
such as “children who steal,’’ ‘‘children who nail bite,” “in- 
mates of a correctional institution?” 


B. Hypotheses 


1. What are the hypotheses? 
2. Are the hypotheses promising? 





102 


-. FP 


- 





The Journal of Educational Psychology 


Are the hypotheses clearly and precisely stated? 

Are the hypotheses stated in a form that permits them to be 
tested? 

Is it better to hazard a hypothesis or to ask a question? 
Has the study been restricted to one or a few principal hy- 
potheses to be tested? 

Are your hypotheses independent of one another? 

Is it better to hypothesize causal factors or merely to hy- 
pothesize relationships? 


C. Background 


1. 


Have you made a thorough, careful review of the literature 
pertaining to your problem? 


D. Definitions 


1. 
2. 


a 


 - 


Have proper distinctions been made between concepts? 
Have concepts been adequately analyzed so as to distinguish 
between small but significant differences in method, mate- 
rials, subjects, setting, etc.? 

Is there clear and unequivocal meaning in the use of your 
terms? 

Are concepts adequately and accurately defined? 

Do some of your concepts require restrictive definition? 

Do the meanings of your terms change with changes in age, 
sex, socio-economic status, etc.? 

From what (whose) point of view are you defining your 
terms? 


E. Method of Study 


1. 
2. 


3. 


ie 


ll as 


Has a decision as to the method of inquiry been made? 

Is there a relation between the data to be collected and the 
hypotheses which the study is trying to test? 

Do you propose to collect your own data or will you make 
use of published data already gathered? 

Are you planning to use a “shotgun” approach? 

If you plan to study individual cases, have you given thought 
as to how you will go from cases to general conclusions? 
Are the data necessary for your study available? 

Are you in a position to secure the data necessary for a suc- 
cessful prosecution of your study? 











10. 


11. 


12. 


13. 


14. 


15. 


16. 
17. 


18. 


19. 


20. 


21. 





Educational Psychology Research Checklist 103 


Are there variations in the method whose results would be 
worth investigating? 

When more than one investigational approach is available, is 
it worth while to compare the results using different methods? 
Can you draw conclusions as to cause and effect from evi- 
dence as to relationship? 

To what extent can you generalize from a single experimental 
situation? 

Are you planning to draw general conclusions from a study 
limited in age, sex, social class, race, etc.? 

Have you considered the desirability of studying the rela- 
tionships covering the whole range of your population in- 
stead of studying contrasting extreme groups? 

Is the range among your cases sufficient to permit you to 
demonstrate the relationships in which you are interested? 
Is it possible that the differences or relationships in which 
you are interested might be due to differences inherent in 
the situation (personality differences, for instance) instead 
of or as well as the experimental factors you propose 
to study? 

How do you propose to select your criterion groups? 

Can you make detailed observations of individual subjects 
to supplement your mass data? 

Would it be well to ask questions on both the positive and 
negative side of an issue instead of just one side? 

Are you justified in assuming a constant motivation among 
your subjects? 

Will your results be influenced by the order or position of 
your materials? 

Have you considered the possibility of response sets in your 
subjects; i.e., a general tendency to respond, which might 
influence your results? 


F.. Design 


. Is the design of your study clearly formulated? 

Have you taken into account the various hidden factors 
which might influence the results of your study besides the 
variables that you are specifically planning to study? 

Have you taken into account the influence of age, sex, school 





104 


Po 





The Journal of Educational Psychology 


grade, 1.Q., socio-economic status, mental set, leakage 
among subjects, influence of examiner, emotional factors? 
Have you taken sufficiently into account experiences other 
than your experimental variables that might intervene be- 
tween your first and second test? 

How may the influence of these variables be eliminated? 
Have you made a decision with regard to whether it is better 
to control variables experimentally or to test the contribu- 
tion of variables statistically? 

Have you given sufficient thought to the necessity of con- 
trols? 

Are you in a position to measure and/or control variables 
which might influence your results? 

Does the design of your experiment permit you to randomize 
variables which you do not want to influence your results? 


G. Sampling 


1. 
2. 
3. 


ot 


PPP 


10. 


11. 


12. 


13. 


To what extent will you be able to generalize your findings? 
What are the criteria for selecting your cases? 

Is sampling of materials, places, subject, test items, etc. 
adequate? 

How do you propose to select subjects for your study? 

Are your proposed subjects representative of the population 
to which you intend to generalize your results? 

What age level and age range do you propose to study? 
Have you given sufficient attention to the sampling of tasks? 
What factors may be biasing the selection of subjects? 

Do you propose to use groups already selected by social pro- 
cesses or do you propose to select groups by direct testing of 
the characteristics of individuals? 

Would your results be more clear cut if you selected groups 
which give promise of showing wide differences? 

How stable will your findings be—that is, will they stand up 
when made under different conditions—when made with 
other subjects, materials, instructors, examiners, in other 
places, etc. 

Will you make your selections on the basis of known factors 
or on the basis of a random sampling? 

Should you treat as a unit individuals over a wide age, grade 
or cultural range? 








14, 


15. 


16. 


17. 


18. 


19. 


20. 
21. 





Educational Psychology Research Checklist 105 


Can you simplify your problem by limiting it to a narrower 
age range, grade range, geographical range, etc? 

Are you taking into account the sub-groups in your total 
population? 

How do you propose to group your subjects? 

In studying the differences between two populations, are you 
sure the populations are differentiated on the variables which 
you assume differentiate them? 

Are you determining your sampling or are you letting others 
select cases for you? 

If you depend on voluntary participation for your subjects, 
have you given consideration to what this will do to your 
sampling? 

How do you plan to determine the equivalence of groups? 
If not all of those whom you solicit return questionnaires, 
what does this do to your sampling? 


H. Studying Personality 


1. 


2. 


10. 


Are you differentiating between manifest and covert person- 
ality trends? 

To what extent is it possible to consider motives and uncon- 
scious purposes? 

How do you propose to determine the values, attitudes, in- 
terests, motives, etc. that you plan to investigate? 

Will interviews be biased if the interviewer has access to 
other information about an individual? 

Can you depend on the subject’s report to inform you about 
his motives, attitudes, problems, etc? 

Are you sufficiently aware of the influence of dynamic mech- 
anisms (repression, projection, denial, etc.) in the reports of 
your subjects? 

How are you going to ensure that your subjects express their 
true feelings and attitudes? 

If you plan to investigate underlying dynamics, are you pre- 
pared to establish close relationships with your subjects? 
Is the questionnaire or testing approach going to permit you 
to get at the inner dynamics? 

In a study using tests and other objective data, is it possible 
to question your subjects further to determine attitudes, 
beliefs, motives, etc? 





106 


11 


12. 


13. 


14. 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 





The Journal of Educational Psychology 


Are you assuming that a verbal report of behavior is identical 
with the behavior? 

Are you assuming that a verbal report of attitude is identical 
with the attitude? 

Are you confounding a verbal report of memory with the 
actual facts? 

Is the recall of childhood experiences a safe index of the ac- 
tual experiences? 

Which is preferable for your study—the questionnaire or 
interview approach? 

In a study involving interviewing, have you considered the 
possibility of securing coéperation and frankness from your 
subjects? 

To what extent is the interviewer influencing the answers to 
questions? 

To what extent can you depend on self-testimony as com- 
pared with the testimony of others? 

Are you assuming that inner expression of value, interest 
and feeling are the same as the judgment of a person made 
by some other person? 

Is one justified in attributing dynamic significance to be- 
havior for the purpose of research? 

Do you prefer the free response type of inquiry (so-called 
open-ended questions) where the answers to questions must 
be categorized, or do you prefer recognition type questions 
where the answers can be directly tabulated? 

In using observation material, how can you be sure that 
your observations are representative? 

Have you made sure that the items which you have selected 
to represent a given trait are not biased in some other trait? 
In thinking about personality characteristics, are these gen- 
eral or do they relate to specific situations? 


I. Tests and Measures 


PP} 


Are you using the appropriate measures? 

How is the material in your test to be selected? 

Are your measures independent? 

Have you taken into account the difficulty level of your test? 
Do you wish to use a time-limit or work-limit? 











© 


10. 
11. 
12. 
13. 
14. 
15. 
16. 
17. 
18. 
19. 
20. 


21. 


22. 


23. 


24. 


25. 





Educational Psychology Research Checklist 107 


Have you considered the relative merits of recall and recog- 
nition types of items? 

Have you given consideration to the best type of score for 
your test? 

How will you construct a scale to measure attitude? 

Do you have a large enough reservoir of items to be able to 
make a selection of items in terms of difficulty, validity, etc? 
How do you propose to make a composite of your measures? 
What would such a composite mean? 

In using a questionnaire, do you propose to get a total score 
or merely to tabulate the answers to separate items? 

To what extent are the responses of your subjects limited by 
the examiner or the directions? | 

Can you use a ready made measuring instrument or should 
you construct your own? If the latter, how will you deter- 
mine its validity and reliability? 

How valid are your measures? 

Are there contaminating factors (age, sex, etc.) which might 
lessen the validity of the test you propose to use? 

Do you propose testing the validity of your test on a new 
group? 

Are your tests sufficiently reliable? 

Do you propose to test the reliability of your measures? 
Are you planning to use your time in scale making, testing 
validity and reliability when it could more profitably be di- 
rected elsewhere? 

Are there norms with which to compare the findings in the 
group which you are proposing to study? 

If you plan tousecontent material (liketheT.A.T.) asa meas- 
uring device, have you specified the variables you propose 
to measure? 

Have you properly disguised your intentions in the test you 
propose to use? 

Should a pre-test be given as well as a test at the close of 
your experiment? 

Is a pass-fail score on a test item identical with a qualitative 
score based on a content analysis? 

Is a point score which is the sum of factors used in the solu- 
tion to a problem adequate as a measure when there is a 
possibility that the factors may differ in importance? 





108 





The Journal of Educational Psychology 


26. If you plan to study the extremes on your scale, would it be 


well at the same time to pay attention to the intermediate 
points? 


J. Use of Judgment 


1. 


If you plan to use judgments, have you specified the basis 
on which your judgments would be made? 


2. In securing ratings, whom will you select as your judges? 
3. 


If you plan to use judgments, are you sure your judges have 
the necessary intelligence, information, background and 
other qualifications to permit them to make the judgments? 
Can you depend on judgments to serve as a basis for selecting 
individuals as being mentally deficient, delinquent, schizo- 
phrenic, etc. 

To what extent will subjective factors enter into judgments 
that you propose to make (or use) and how can these be 
avoided? 

Which is better for your study—the overall judgment or a 
content analysis? 


K. Content Analysis 


: 
. Who is to make the classification scheme? 


2 
3. 
4. How do you propose to distribute material into the several 


mor 


How do you propose to determine your classification scheme? 
How reliable will the classification scheme be? 


classificatory categories? 

Who is to distribute the items into categories? 

Have you provided for determining the reliability of your 
classifications? 

Have you sufficiently taken into account the subjective fac- 
tor in analyzing content material? 

Have you given attention to what sorts of behavior or re- 
sponses will be included under such general categories as 
“resistance” or ‘“‘rejection?”’ 


L. Statistical Handling of Results 


1. 


2. 


Do the conditions of your data warrant using the statistics 
which you propose to use? 

Do your data satisfy the assumptions on which the statistical 
constants which you propose to use are based? 











Educational Psychology Research Checklist 109 


3. Have you enough cases to test fairly the significance of your 


statistical constants? 


M. The Ed.D. Report 


1. 


2. 


In preparing an Ed.D. project, have you considered the 
audience for which you will be writing? 

In planning a Type B! project, are facts available concerning 
the situation in which the plan would operate? 

In a Type B project, how will you determine the principles 
which will serve as the basis for your plan? 

Have you given thought as to how you will defend your pro- 
posals in a Type B project? 

In preparing a handbook, have you considered the manner 
of presentation? 





1In Teachers College, Columbia University, a Type B project which 
may be submitted as partial fulfillment of the requirements for the Ed. D. 
degree is a plan for the improvement of education in some specific situa- 
tion. 








GENERAL RESPONSE PATTERN TO 
FIVE-CHOICE ITEMS 


EDWARD L. CLARK 


Northwestern University 


Response tendencies of individuals to multiple choice test items 
have recently been the subject of several investigations. The ten- 
dency of some individuals to respond in a given direction, such as 
the selection of end or extreme responses in four or five choice items, 
has been studied by Berg and others (2, 3) to determine what as- 
sociation, if any, such tendencies may have with personality types 
or with clinical diagnoses. These studies have to do especially with 
the willingness of subjects to select end responses such as “dislike 
very much” or “like very much” and the evidence collected indi- 
cates that responses to such multiple choice items may be an in- 
dividual expression of set. 

Although tendencies to choose or avoid end responses have been 
observed in non-verbal tests of standard option form, similar stud- 
ies in conventional multiple-choice tests of knowledge of subject 
matter or scholastic aptitude are rare. One hears informal opinions 
expressed occasionally to the effect that students neglect the third 
position or the fourth position of five-choice tests, and more rarely 
such opinions get into print. Atwell and Wells (1) state that ‘‘where 
the answers are pure guesses the position of misleads is a distinct 
factor in the ones which are selected. On such a selection a definite 
pattern is found, positions one, five, three, four, two, being chosen 
with decreasing frequency in that order.” No supporting data ac- 
company this assertion. McNamara and Weitzman (6) find that 
items having right answers in the fourth position are the most diffi- 
cult, those with right answers in the second and third positions 
are the easiest, and the first and fifth positions are of equal inter- 
mediate difficulty. Their results for four-choice items show that 
the third position is the most difficult, and this finding is inter- 
preted as agreeing with the results for five-choice items in that the 
next-to-last position is the most difficult. Otherwise, there seems 
to be little agreement between findings for positions for five-choice 
and four-choice items. To exclude the possibility that item writers, 
responding inadvertently to some common belief, had placed the 


110 











Five-Choice Item General Response Pattern 111 


more difficult right answers in the next-to-last position, a compari- 
son was made between such items in one form of tests with the same 
items in other forms where the right answer was not in the next-to- 
last position. This comparison did, indeed, show that items were 
slightly easier when the answers occurred in other than the next-to- 
last position. These investigators did not make this definitive test 
for any of the other positions. Cronbach (5) expresses the opinion 
that the five-choice type of item is, as compared to the two-choice 
item, quite free from the influence of position of the right answer. 

The purpose of this study is to investigate more fully than has 
been done in previous studies a possible tendency among college 
students to prefer certain response positions to the neglect of others 
in five-choice tests. The preliminary findings were collected during 
the eight-year period when the Clark Analogies Test was being 
constructed (4). Because both the right answers and the wrong 
answers were placed by casting dice, it was hoped that no repeating 
pattern of right answers nor special placement of attractive dis- 
tractors characterized the early editions of this test. However, the 
relatively small number of items used—about five hundred during 
the eight years—did not preclude the possibility that the rare dis- 
tractor of great attractiveness should happen to occur more often 
in one position than another. Consequently it was deemed neces- 
sary to collect distributions of wrong answers from three additional 
tests: N. U. Form A, a scholastic aptitude test made of two hundred 
difficult word definitions; three forms of the Ohio State University 
Psychological Test; and three subtests of The Guilford-Zimmerman 
Aptitude Survey. 

The data in Table I are based, with the exception of Experi- 
mental Analogies I and II, on the wrong answers made by appli- 
cants for admission to Northwestern University. The entries of 
the table give the ratio of observed to expected numbers of wrong 
answers for each of the five positions of the several tests. To de- 
termine the expected number for each position, one-fourth was 
taken of the wrong answers to all items having the right answers 
in the other four positions. For example, the expected number of 
wrong answers in the first position was one-fourth of all wrong 
answers given to items whose right answers were in positions two, 
three, four, or five. Because the total number of wrong answers 
was not the same for each set of items or the expected number of 
wrongs in each position was not always just the same, the un- 











112 The Journal of Educational Psychology 


TaBLE I—DaTA ON THE DISTRIBUTION OF WRONG ANSWERS IN THE FIvE 
Positions oF MuutipLte Cuoice Test Items 





Ratios of Observed to Expected Numbers 
No. of | of Wrong Answers for the Five Positions | No. of A ra 


Items Ex- 
Answers : 
1. 2. 3. 4. 5. waneueee 


Name of Tests 








Clark Analogies 500 |1.1010/1 .0290|1 .0136/0.9895/0.8692| 92,822) 2000 
N.U. Form A 200 |1.0485/1.1681/1.1070/0.8044/0.8560) 41,531) 400 
Ohio Psycholog- 450 |1.0866/1 .0532)1.0080,.0.9496'0.9028) 44,776) 640 

ical 
Guilford-Zimmer- 146 |1.0406/1.0504/1 .0940/0.9923/0.8233) 42,577; 1200 

man 
Exp. Analogies I (30)|0.9816/0.9913/1 .0009/1 .0224/1.0099| 5,224) 628 
Exp. Analogies II (55)|1.0974|1.0174/1.0001/0.9112'0.9691; 11,423) 400 


C. Anal. Forms A 


























& B: 

Given first (300) |0.9964/0.9996'0 .9835)1.042910.9814| 15,026) 272 

Given second (300) |1.0736/0.9973/0.9543/1.0134'0.9600) 15,021| (272) 
Totals 1,296 |1.0703/1.0538)1 .0335\0. 9568/0. 8837/268 , 400|5, 540 





weighted average of the five ratios of observed to expected wrong 
answers for each test wili not necessarily be 1.0000. 


RESULTS 


The results shown on the first line of Table I are based on the 
numerous early editions of the analogies test in which approxi- 
mately five hundred different items were tried out in tests no longer 
than one hundred and fifty items each. The ratio, 1.1010, of ob- 
served to expected number of wrong answers for position one was 
based on the twenty thousand, five hundred and seventy-one wrong 
answers made in this position to the eighteen thousand, six hundred 
and eighty-four wrong answers expected: the expected eighteen 
thousand six hundred and eighty-four is one-fourth of all wrong 
answers given to all items having right answers in position two, 
three, four, and five. The remaining ratios of observed to expected 
number of wrong answers shows a declining sequence. In the five 
positions, ninety-two thousand, eight hundred and twenty-two 
wrong answers were tabulated from the papers of approximately 
two thousand examinees. A chi-square based on the five pairs of 
observed and expected wrong answers was found to be five hundred 
and thirty-nine or about forty times the value needed to be signifi- 
cant at the one per cent level of confidence. In the second line, the 











Five-Choice Item General Response Pattern 113 


ratios depart from 1.0000 in a very significant manner but they 
do not exactly parallel those of the first line. 

Attention is called to the fact that N. U. Form A was a very hard 
test of two hundred items. The four hundred examinees all took 
the entire test and on the average each made over one hundred 
wrong answers. The results on the third line are based on the three 
editions of the Ohio State University Psychological Test of one 
hundred and fifty items each and the ratios obtained depart from 
1.000 in a manner similar to that found for the analogies tests. 
The data of the fourth line, derived from the three short subtests 
of the Guilford-Zimmerman battery, again show ratios, departing 
considerably from 1.000, which are statistically stable. 

Although the first four groups of tests did agree in showing rela- 
tively more wrong answers in the first positions than in the last— 
a tendency of great statistical stability—each test differed signifi- 
cantly from each other one. To understand this lack of agreement 
in results among the tests, it is necessary to examine the possible 
effects of the placement of unusually attractive misleads. For most 
of the tests used in this study the average item was answered cor- 
rectly about sixty per cent of the time. Thus, the four wrong options 
each got an average of about ten per cent of all the responses. There 
were, however, a few items which had very attractive misleads 
which alone drew responses from fifty or even sixty per cent of the 
examinees. Even when the position of all options is determined by 
dice, there is little reason to expect that the relatively small number 
of items with very attractive misleads will be evenly distributed 
in the five positions of any one test. Getting tens of thousands of 
wrong responses from examinees only serves to establish with great 
stability the less than symmetrical distribution of attractive mis- 
leads. The results obtained for the N. U. Form A test were espec- 
ially subject to the chance distribution of attractive misleads. 
Since popular misconceptions of the meanings of words were fre- 
quently used as wrong options in this test, every eighth or tenth 
item had a mislead which was used more often than the right an- 
swer. Much of the lack of agreement among the first four groups of 
tests as to the relative attractiveness of the five positions must be 
due to differences in the chance distribution of the very attractive 


wrong answers. 
In order to control rigidly the chance placement of very attrac- 
tive wrong options, Experimental Analogies I was designed. Thirty 











114 The Journal of Educational Psychology 


items were used, six having right answers in each of the five posi- 
tions, and four forms of the test were made. The right answer of 
each item always occurred in a given position but in the four forms 
the items had slightly different order on the page and the position 
of the wrong options were systematically varied according to a 
Latin square: each wrong option occurred once in each position 
and always followed a different wrong option. 

The results of Experimental Analogies I when given to six hun- 
dred and twenty-eight students of elementary psychology are 
shown in line five of the table. The differences between observed 
and expected numbers of wrong answers for this one-page test were 
amazingly slight: a chi-square test of the data indicated that a 
random drawing of five thousand, two hundred and twenty-four 
cases of five equally probable events would in over ninety-five cases 
out of one hundred show greater departures than our obtained 
ratios. To assume any tendency of these six hundred and twenty- 
eight students of elementary psychology to give preference to any 
of the five positions of this test would be entirely unwarranted. 

In order to explore further this remarkably even distribution 
of wrong answers, the following year a longer experimental analo- 
gies test having fifty-five items was constructed in four forms using 
varied order of items and, again, a Latin square design for the 
placement of the wrong options was used. The results from this 
fifty-five-item test administered to four hundred elementary psy- 
chology students gave the ratios shown in the sixth line. Although 
the Latin square design was used just as it had been used the year 
before, the departures from chance of the ratios expected were 
highly significant (chi-square was three times the value required 
for significance at the one per cent level) and resembled roughly 
those departures from chance which had been found earlier for 
the four standardized tests. The neglect of position four to be ob- 
served in Experimental Analogies II was found, upon examination 
of the data, to be due entirely to what happened for nine of the 
eleven items which had their right answers in position three. That 
is to say, the eleven items having right answers in position one, the 
eleven having right answers in position two, and the eleven with 
right answers in position five did not show a neglect of wrong an- 
swers placed in position four. 

The results from the two experimental analogies tests are not 











Five-Choice Item General Response Pattern 115 


in agreement with each other although both used the Latin square 
design for the position of wrong options. All conditions for the col- 
lecting of data for the two tests were alike, apparently, except for 
the number of items in the tests. Nothing was said at either testing 
about a need for haste in finishing the test, but the fact that the 
first test was on one page and the second test was on two pages must 
have affected the attitude of the subjects towards it. Although the 
class period provided ample time to do all fifty-five items, an oc- 
casional student must have felt hurried and must not have read 
beyond the option which he considered the right one. 

Since the results from Experimental Analogies II suggested that 
a feeling of haste tended to make examinees neglect positions four 
and five, it was reasoned that subjects might also neglect the last 
positions because of fatigue or boredom. To test this hypothesis, 
data were used which had been gathered from two hundred and 
seventy-two applicants who had taken Form A and Form B at one 
testing session. One-half had taken Form A first and one-half Form 
B first; all took both tests with a ten-minute rest interval between 
the two ninety-minute periods of work. Since both Form A and 
Form B occurred an equal number of times as the first test (and as 
the second test), all the attractive wrong answers could have oc- 
curred in one position, e.g., in position four, without invalidating 
the comparison of the results from first and second testings. The 
chance placement of attractive wrong answers was exactly the same 
in the two periods of testing and, thus, ruled out of this comparison. 
The third line from the bottom of the table of data shows the use 
of the five positions for wrong answers when either Form A or Form 
B was the first test taken, and the next line below shows what hap- 
pens when either of these two tests was the second taken. It will 
be noted that the total number of wrong responses was almost ex- 
actly the same on second taking as it was on the first: any practice 
effects must have been hidden by the results of fatigue or boredom. 
(The r between the two forms was 0.92) The difference in the use 
of the five positions for wrong answers on taking the second form 
of the test, as contrasted with the first form, were as follows: the 
first position was used two hundred and forty-two more times and 
all the remaining positions were used fewer times—eighty, ninety, 
eighty-three, and sixty-six, respectively. Thus, the first position 
was used much more often on the second taking of a form of the 








116 The Journal of Educational Psychology 


test and the last three positions were used less often. A chi-square 
test for these differences indicated significance at the two per cent 
level of confidence. 

In summary, the great mass of data of the study which is based 
on the four standardized tests (lines one to four in the table) indi- 
cates that there was some neglect of the last positions in five-option 
multiple choice items. The lack of complete agreement between 
tests as to positional preferences of students is probably due to the 
chance placement of the relatively few very attractive misleads or 
wrong options. Data from the relatively small groups of subjects 
(lines five to eight of the table) suggest that the pressure of time 
limitations and fatigue or boredom are effective in making a de- 
clining frequency of the use of the five positions. 

The general results of this study, as summarized by the last line 
of data, for all tests, show a tendency of a few subjects to neglect 
the last positions—a failure to read all the way through an item 
before making a response. To assume that once in one hundred 
items students read only the first two options, once in a dozen items 
they read but three options, and once in twenty or twenty-five 
they read but four options would give results much like those in 
the last summary line of the table. The data suggest that under 
usual testing conditions with a time pressure and, in long tests, a 
certain amount of boredom, students will in less than ten per cent 
of the items read the first three options only, in very rare cases 
they read only two options and again rarely, they fail to read the 


last option. 


CONCLUSIONS 


(1) General positional preferences as they exist with five-choice 
items are weak. 

(2) When the relative attractiveness of misleads is experiment- 
ally controlled and when examinees are under no time pressure to 
finish, positional preferences are, indeed, infinitesimal. 

(3) Working under the usual pressure of time, subjects use the 
five positions in a slightly declining frequency. 


REFERENCES 


1) C. R. Atwell and F. L. Wells, ‘“‘Wide Range Multiple Choice Vocabu- 
lary Tests,’’ Journal of Applied Psychology, 21 : 550-555, 1937. 

2) Irwin A. Berg, ‘“‘The Reliability of Extreme Position Response Sets 
in Two Tests,’’ Journal of Psychology, 36: 3-9, 1953. 








Five-Choice Item General Response Pattern 117 


$8) Irwin A. Berg and J. 8. Collier, ‘‘Personality and Group Differences 
in Extreme Response Sets,’’ Educational and Psychological Measurement, 
13: 164-169, 1953. 

4) Edward L. Clark, Clark Analogies Test, Forms A & B. Evanston, 
Illinois, 1953. 

&) L. J. Cronbach, ‘‘Response Sets and Test Validity, Educational and 
Psychological Measurement, 6: 475-494, 1946. 

6) W. J. McNamara and E. Weitzman, ‘‘The Effect of Choice Placement 
on the Difficulty of Multiple-Choice Questions,’? Journal of Educational 
Psychology, 36: 103-113, 1945. 








STIMULUS FAMILIARIZATION AS A FACTOR 
IN IDEATIONAL LEARNING' 


WALTER WEISS and BERNARD J. FINE 


Boston University 


A variety of research has indicated that the association of res- 
ponses with stimuli is facilitated by prior familiarization with the 
stimuli. For example, Lawrence found that the learning of a new 
instrumental response is faster with a familiar cue than with one 
to which the animal had not learned to attend (3, 4). Other investi- 
gators have demonstrated that the prior association of irrelevant 
responses with critical stimuli facilitated the later learning of the 
desired responses with those stimuli (/, 5, 6). Stimulus familiarity 
has also been shown to facilitate serial-order learning (2) and ta- 
chistoscopic recognition (7). While humans have been used as sub- 
jects in most of the studies, the problems presented were not com- 
plex ones and did not require much use of the implicit symbolic 
mechanisms called for in learning ideational material from mass 
media. 

The main objective of the present research was to test the pre- 
diction that prior familiarization with key terms will facilitate 
learning from an informational communication. A second purpose 
is related to a pragmatic consideration. Since the familiarization 
procedure takes time, is it more advantageous to expose the learners 
twice to the communication without prior familiarization than 
once preceded by familiarization training? 


METHOD 


Experimental Conditions and Materials. Three groups were 
formed comprising a familiarization-training group, a one-showing 
control group, and a two-showing control group. Hereafter, they 
will be designated respectively as #, Cl, and C2. All three were 
exposed to an edited version of a film-strip on the organization of 
the United Nations, called “Structure for Peace.” The structures 
and functions of the six main subdivisions of the UN were de- 





1 This research was supported by the United States Air Force under Con- 
tract No. AF 18(600)-1210, monitored by the Training Aids Research 
Laboratories, Chanute Air Force Base. 


118 




















Stimulus Familiarization in Ideational Learning 119 


scribed in some detail in a recorded script which was synchronized 
with the film-strip. For C1 this completed the learning aspect of 
the session. C2 was immediately re-exposed to the communication. 
The £ group saw the film-strip only once but immediately prior 
to that underwent a familiarization-training procedure. It was only 
after completing this procedure that the EF classes were informed 
that they were to view a film-strip on the UN. 

The visual material used in the familiarization procedure con- 
sisted of six of the twenty-four content-related frames of the pre- 
pared film-strip. Each of the frames was a drawn representation of 
a major subdivision of the UN but was bare of any identifying ma- 
terial.2 A paired-associates procedure modified for group presenta- 
tion comprised the familiarization technique. As each frame was 
shown, a recorded voice indicated the correct identifying name for 
that subdivision. The Ss simply listened and learned during this 
sequence. Then, the frames were presented again, but without the 
recorded identification of the names. This time the Ss had eight 
seconds in which to select the name of the appropriate organ from 
among the six printed on a prepared sheet. They did this by placing 
the number 1 next to the judged name for the first frame shown, etc. 
The recorded voice called out the number of each frame as it was 
shown. This arrangement of a listening trial and then an answering 
trial was repeated six times. The order of the frames was 
not changed from a listening to an answering trial; but from one 
listening-answering sequence to the next, the order was changed 
in accordance with a predetermined six by six Latin square ar- 
rangement. The familiarization training took eleven minutes, which 
was equal in time to that needed for one showing of the informa- 
tional film-strip. Thus, both the EF and C2 groups spent the same 
amount of time in the learning aspect of the session. 

Before the film-strip was presented, the Ss were told that they 
would be tested immediately afterwards on its content. The after- 
questionnaire contained twenty multiple-choice items related to 
the structures and functions of the six main parts of the UN and 





2 When a subdivision of the UN was described in the film-strip, the 
relevant frame from among the six used in the familiarization procedure 


was always the first one shown. 
3 Approximately eighty per cent of the Ss in the E classes responded with- 


out error on the last two answering trials and approximately ninety per cent 
were correct on the last trial. 








120 The Journal of Educational Psychology 


five additional questions on the UN covering general knowledge 
not presented in the film-strip. These five were designed to provide 
a rough estimate of the Ss’ pre-existing levels of information about 
the UN. A few opinion items on reactions to the film-strip were 
also included. 

Subjects. Approximately seven hundred upperclass students from 
three high schools in the Boston area were tested. Since intact clas- 
ses were used, the N for the statistical analyses is based on classes 
as the sampling units. Eleven were assigned to the EF group, ten 
to the Cl group, and seven to the C2 group. Two or three classes 
were examined during each school hour. When two classes were 
tested, they were assigned randomly to the E and C1 groups; other- 
wise they were assigned at random to the three treatments. One 
C1 class was eliminated as a disciplinary problem. The number of 
Ss in each class varied from fourteen to thirty-four. 


RESULTS 


Before examining the effect of the treatments on learning from 
the film-strip, it is important to determine if differences existed 
between the groups in their prior fund of knowledge about the UN. 
A rough index of knowledgeability is given by the mean scores on 
the general information questions in Table I. None of the differ- 
ences between the groups is significant. 

Table I also contains the means for each group on the twenty 
fact-quiz items related to the content of the film-strip. Significant 
differences are obtained between the EF and C1 groups (p < 0.05, 
one tail) and between the C2 and C1 groups (p < 0.05, two tails). 
Furthermore, these differences hold for the “easy” questions as 
well as the ‘‘difficult”’ ones. The means of the EF and C2 groups do 
not differ reliably from each other (p = 0.50, two tails). 


TaBLE I—Mean NoumsBeEr or Fact-Quiz ITEmMs CorRECT 











General Information about UN Content of Film-Strip 
Group N 
Mean* SD Meant SD 
E 11 2.43 0.77 11.75 2.09 
C1 10 2.15 0.60 10.38 1.28 
C2 7 2.31 0.92 12.41 0.92 




















* Based on five questions. 
Tt Based on twenty questions. 








Stimulus Familiarization in Ideational Learning 121 


TaBLeE II—JupGmMEeNTs or INTEREST VALUE OF FILM-STRIP AND OWN 
MoTIVATION TO LEARN 


A. Mean per cent per group judging the film-strip ‘‘very interesting’? 
or ‘“‘somewhat interesting.” 








Group N Per cent SD* 
E 11 50.4 9.04 
Cl 10 62.6 8.95 
C2 7 57.8 9.03 














B. Mean per cent per group who “tried very hard to learn” or ‘‘tried 
somewhat hard to learn’’ while attending to the film-strip 





E 11 50.8 9.68 
Cl 10 62.0 13.60 
C2 7 62.5 14.23 





* Based on arc sine transformation. 


Since different fact-quiz items tested knowledge of the structures 
and functions of the different parts of the UN, the mean of each 
group of questions related to a particular subdivision was com- 
puted. For each subset of questions, the mean for the EF group is 
higher than the corresponding one for the C1 group. The differ- 
ences are reliable for three of the six means. The same results are 
obtained for the comparisons between the means for the C2 and 
C1 groups. While the C2 group is slightly superior to the E group 
on all of the subsets of questions, the differences between any pair 
of means do not approach significance. 

The after-questionnaire contained two opinion items on the Ss’ 
reactions to the film-strip. One requested them to indicate how 
interesting they thought the film-strip was; and the other asked 
them to indicate their ‘felt’? motivation-to-learn while attending 
to the film-strip. The data in Table ITI reveal that the percentages 
of Ss in the familiarization group who considered the film-strip 
interesting or who stated that they tried “hard” to learn were smal- 
ler than the corresponding ones for the Ss in the C1 and C2 groups. 
However, none of these differences reach the 0.05 level, two tails. 


DISCUSSION 


The results of the research confirm the prediction that prior fa- 
miliarization with key terms will facilitate learning from an in- 
formational communication. Furthermore, the time devoted to the 








122 The Journal of Educational Psychology 


specialized training appears to be as advantageous as a second ex- 
posure to the film-strip. This is a significant pragmatic finding; 
specialized procedures which require extra time are often shown to 
improve learning in comparison with no specialized training, but 
not in comparison with extended practice on the task as another 
way of utilizing the extra time. As demonstrated in previous re- 
search, the data confirm that two viewings of an audio-visual pre- 
sentation lead to greater learning than does one viewing. 

The facilitating effect of the special training procedure can be 
attributed in part to the greater ease with which associations are 
developed and retained when familiar, discriminable stimuli are 
involved. Some form of this explanation is the one most often met 
with in the relevant literature (1, 2, 4, 6). However, the particular 
familiarization technique and informational communication used 
in this study present special considerations requiring additional 
interpretive assumptions. A kind of relevant S-R (picture-name) 
pre-training appears to be involved, with the R becoming a critical 
stimulus in the ideational learning. That is, when the film-strip 
describes a particular subdivision of the UN, a familiar frame is 
shown. Ss respond implicitly to the picture with its appropriate 
name. At the same time, the recorded voice mentions the name of 
the subdivision and relates descriptive, ideational content to it. 
To respond correctly on the fact-quiz test, S must retain the associ- 
ation between name and content in an essentially similar form. Of- 
ten, the indefinite and non-discriminating referent word, “‘it,” is 
used as the subject of statements detailing critical attributes of 
the organs. However, the presence of the picture may keep active 
the implicit mediating response of the critical name; this leads to 
a stronger connection between the descriptive content and the name 
than would otherwise occur. It is presumed that this mediating 
effect perseverates throughout the description of the subdivision, 
even when unfamiliar frames are shown. When the next major unit 
of the UN is discussed, a second familiar frame is shown; and its 
identifying name is evoked. However, the prior discrimination 
learned between the names will facilitate the mnemonic separation 
between the content referring to the immediately preceding sub- 
division and that relating to the new one. That is, prior discrimina- 
tion among stimuli may aid the organization of associated content 
into retentive units that are less vulnerable to interfering associa- 
tions with related content. 

Despite the lack of differences significant at the 0.05 level, con- 











Stimulus Familiarization in Ideational Learning 123 


sideration of the opinion data leads to a suggestive inference. If 
the negative influence of the familiarization procedure on motiva- 
tion-to-learn from the film-strip could be counteracted or reduced, 
a greater gain as a result of the pre-training might be found. 


SUMMARY 


The primary purpose of the research was to test the prediction 
that prior familiarization with key terms will facilitate learning 
from an informational communication. A second objective was to 
compare the value of two exposures to the communication with 
one exposure preceded by familiarization training. Three groups 
were formed: group C1 was exposed only once to a film-strip on the 
UN; group C2 saw the film-strip twice; and group £ saw the film- 
strip once but prior to that underwent a familiarization procedure, 
in which the Ss learned to associate the names of the six main or- 
gans of the UN with pictures of them taken from the film-strip. 
Following exposure to the film-strip, all Ss responded to a question- 
naire containing a series of multiple-choice items relating to the 
content of the film-strip, some questions testing general informa- 
tion about the UN, and a few opinion questions. 

The main results were: 

(1) The £ group was superior to the C1 group in information 
learned from the film-strip. Thus, the major prediction that stimu- 
lus-familiarization would facilitate learning was confirmed. 

(2) The £ group did not differ reliably from the C2 group in 
learning from the film-strip. Thus, the familiarization training ap- 
pears to be as advantageous as a second viewing of the film-strip, 


REFERENCES 


1) R. M. Gagne and K. E. Baker, “Stimulus Pre-Differentiation as a 
Factor in Transfer of Training,’’ Journal of Experimental Psychology, 
40 : 439-451, 1950. 

2) C. I. Hovland and K. H. Kurtz, “Experimental Studies in Rote- 
Learning Theory: X. Pre-Learning Syllable Familiarization and the Length- 
Difficulty Relationship,’’ Journal of Experimental Psychology, 44: 31-39, 
1952. 

3) D. H. Lawrence, ‘“‘Acquired Distinctiveness of Cues: I. Transfer be- 
tween Discriminations Based on Familiarity with the Stimulus,’’ Journal 
of Experimental Psychology, 39: 770-784, 1949. 

4) D. H. Lawrence, ‘‘Acquired Distinctiveness of Cues: II. Selective 
Association in a Constant Stimulus Situation,’”’ Journal of Experimental 
Psychology, 40: 175-188, 1950. 

§) D. E. McAllister, ‘“The Effects of Various Kinds of Relevant Verbal 





124 The Journal of Educational Psychology 


Pre-Training on Subsequent Motor Performance,”’ Journal of Experimental 
Psychology, 46 : 329-336, 1953. 

6) I. L. Rossman and A. E. Goss, ‘“The Acquired Distinctiveness of Cues: 
the Réle of Discriminative Verbal Responses in Facilitating the Acquisition 
of Discriminative Motor Responses,”? Journal of Experimental Psychology, 
42: 173-182, 1951. 

7) R. L. Solomon and D. H. Howes, ‘‘Word Frequency, Personal Values, 
and Visual Duration Thresholds,’’ Psychological Review, 58: 256-270, 1951. 








BOOK REVIEWS 


HERBERT A. THELEN. Dynamics of Groups at Work. Chicago: Uni- 
versity of Chicago Press, 1954, pp. 366. $6.00. 


Since the death of Lewin in 1947, most investigation of group 
processes has consisted of differentiation and exploration of cells 
within larger frameworks, and perhaps rightly so. The extravagant 
promise of much of the early exploration has given way to sober 
work on focused problems. However, this has meant that relatively 
less attention has been given to comprehensive theory-building. 

The author of this book has characteristically been interested in 
integrative concepts, and has made periodic appearances in the 
literature with pieces of what promised to be a useful set of notions 
about group life. This book presents such a framework, drawn from 
eight years’ work at the Human Dynamics Laboratory at the 
University of Chicago. 

The approach is essentially molar, drawing systematic empirical 
conclusions from experience in specified settings (e.g., classroom 
teaching, training, community action), and developing explanatory 
notions to unify these. The book can be said to be non-Lewinian— 
in the topological sense—although the author’s debts to Lewin are 
clear. It is probably one of the best over-all treatments of group 
life now available. Thelen has drawn most directly on Deweyan 
problem-solving and general psychiatric theory, notably that of 
W. R. Bion, to produce an approach highly integrative of the 
“thought-feeling-action” trichotomy. A weakness of Dewey’s was 
his failure to understand the psychiatric dimensions of interper- 
sonal transactions; the weakness of his followers has been a tend- 
ency to ideological preachment. This book avoids both these traps, 
most clearly the former. 

The first half of the book deals with ‘‘technologies’”—sets of 
“principles useful to bring about change toward desired ends.” 
Successively, community action groups, classroom teaching, fac- 
ulty in-service training, administrative decision-making, group 
development training, and large public meetings are examined, and 
proposed models are presented for each of these, often with con- 
siderable freshness and ingenuity. 

The second section presents conceptual material which serves 
to explain and amplify the material in the first section, and relate 


125 








126 The Journal of Educational Psychology 


it to a larger framework. The chapters deal with the individual as 
he functions in groups (“membership,” and the thought-feeling- 
action relationship); the reality factors within and surrounding the 
group; the development of group control systems; leadership; and 
(a severely neglected area in this field) the community as the “con- 
text of group operation.” 

A number of recurrent themes appear: that experimental (feed- 
back-using) method is most desirable; that group behavior repre- 
sents a mixture of work and emotionality—the result of member 
attempts to work on private problems, build and maintain a group, 
and meet the demands of an objective task; that the notion of 
“membership”’ is crucial for understanding anyone’s behavior; and, 
finally, that we must think of the “reality-centered”’ group, working 
autonomously to meet intra-member, intra-group, and inter-group 


demands. 
Many useful and relatively novel ideas are presented. The notion 


of the “bridging” group as a means of decreasing inter-group conflict 
is one, as is the ingenious model of organizational decision-making 
which successfully separates line and staff (“pressure”’ and “codper- 
ation”) functions, while integrating task effort. The “principles 
underlying the control of the group,” in Chapter 10, comprise an 
image of group operation that has not had wide currency; the case 
study of the trainer réle is equally provocative. The community- 
centered model of faculty in-service training, the description of 
community-level processes of communication and communion, and 
the discussion of pressure-oriented and coéperation-oriented groups 
in the community promise to be useful in correcting the current 
tendency to focus only on intra-group processes. Finally, the book 
reminds us constantly, between the lines, that all groups have a 
training dimension, and that improvement of group operation may 
consist largely of enlarging and legitimizing this dimension. 

The book (as the author wryly remarks at the end) is honey- 
combed with lists of generalizations and principles. Some of these, 
such as the suggestions for composing subgroups, the ideas on 
transfer of training, and the discussion of inter-organizational con- 
flict, are now in directly testable form. Others are parts of system- 
atic models, and would require considerable translation before 
assuming the form of hypotheses. 

The style is compact, frequently abstract, rarely unnecessarily 











Book Reviews 127 


complex (as in some earlier publications by the author), often epi- 
grammatic, and sometimes funny. The cross-referencing is ex- 
tremely thorough, and there is a good index. 

This is a needed book. One might wish for more references to the 
molecular-level work done at the Laboratory; and readers who have 
not seen the Human Dynamics Laboratory monograph on Methods 
for Studying Work and Emotionality in Group Operation (1954) may 
complain of the imprecision of the book. It is certainly true that 
the parts of the general theory presented, while internally well re- 
lated, are by no means parsimonious. In a real sense, this book 
approaches the limits of what can be done, theorywise, with ordi- 
nary language, and it is possible that the direction from here lies 
in increasingly abstract and mathematical formulations. Finally, 
careful testing and reconstruction of the theory presented is as 
much needed now as was integrative, molar work prior to the pub- 
lication of this book. MatTrHew B. MILEs 

Teachers College, Columbia University. 


Paut Witry. How to Become a Better Reader. Chicago: Science 
Research Associates, Inc., 1953, pp. 304. 


This text is designed to help people, in school or in work situ- 
ations, to improve rate of reading, comprehension, and joy of read- 
ing. It is organized as a self-teaching book. The first part consists 
of twenty lessons. This is followed by a classified bibliography of 
books the person may want to read. The last part of the book is 
made up of twenty exercises with tests. One of these is to follow 
each lesson in the first part of the book. 

The early lessons should provide motivation to improve reading. 
They are concerned with benefits of reading, possibility of improve- 
ment, needs, nature of eye-movements, and reading for a purpose. 
The subsequent lessons deal with improving various reading skills 
such as rate, skimming, finding the main idea, details, evaluation, 
improving vocabulary, study-type reading, and how to keep on 
reading better. 

Each lesson and accompanying exercise is carefully and effec- 
tively organized so that the reader knows just how to proceed and 
how to measure progress in speed, comprehension and vocabulary 
knowledge. The exposition is clear, and the reading selections are 
well-chosen. 








128 The Journal of Educational Psychology 


The book provides a means for the person who is strongly moti- 
vated to help himself to better reading. Although a self-help text, 
it is probable that best results will be obtained when it is used with 
supervision of a reading specialist. Mites A. TINKER 

University of Minnesota 





























